Abstract:
In this study, empirical global models for Fischer-Tropsch Synthesis (FTS) performance were constructed by using machine learning algorithms on the experimental data published in the literature. CO conversion in FTS was modelled as a function of catalyst design variables, physical properties of the catalyst, and operating conditions by using multiple linear regression, artificial neural network (ANN), decision tree, and random forest in R 3.1.2 environment. The missing values in the physical properties were estimated by constructing ANN models on the known instances. Multiple linear regression and decision tree classification models on the completed dataset yielded the least reliable results with low prediction accuracies. The predictive power of the ANN and random forest models, on the other hand, were much higher with R2 values of 0.40 and 0.59, respectively. The importance analysis on both models revealed that operating temperature is the most crucial attribute in FTS catalytic activity. The predictive power of the models were improved by constructing sub-models trained by the Co-based, Fe-based, low temperature, and high temperature subsets, which were generated by splitting the entire dataset with respect to the base metal and operating temperatures. The R2 values for these sub-models ranged between 0.43 and 0.53 with ANN, and 0.62 and 0.66 with random forest algorithms. According to the importance analyses on these sub-models, the catalyst design variables were determined to be more dominant in Fe-based subset, whereas operating conditions had the utmost importance for Co-based subset. Sensitivity analysis was conducted to observe the effect of three of the most important attributes in the ANN models. The residual analysis on each model revealed that all of the algorithms failed to represent the effect of time on stream on CO conversion. In addition to these predictive approaches, principal component analysis was used to describe the correlations between the input variables and CO conversion. It validated the outcomes of ANN and random forest models by determining the operating temperature as the most important attribute.