Abstract:
In this work, a database for methanol synthesis was constructed from published liter-ature to extract knowledge and to build models to help the future studies. The database was built with 357 data points showing the effects of 28 input variables such as catalyst prepara-tion and reaction conditions on COX conversion, methanol selectivity and methanol yield as response variables. Multiple linear regression (MLR), decision trees and random forest (RF) were independently applied to model COX conversion, methanol selectivity and methanol yield, in the R 3.2.3 environment. MLR and regression trees were not successful in inter-pretable results. Classification trees were applied using discretized response variables; the misclassification errors for training responses were 22 %, 32 % and 25 % for COX conver-sion, methanol selectivity and methanol yield, respectively. Considering that those ratios were much higher for testing (73 %, 83 % and 77 % respectively), it was decided that this method may be used to deduce empirical observations but it is not capable to predict unseen experiments. Random forest yielded the best results in terms of goodness of fit with the {u1D445}{u1D44E}{u1D451}{u1D457}.2 as high as 0.95; hence, it was used to extract information for variable importances. Reduction time and catalyst preparation method were found to be most important variables for COX conversion while reaction temperature and CO/CO2 ratio were found to be key variables for methanol selectivity; reduction temperature and CO amount in the feed composition were found to be most significant variables for the methanol yield. However, despite the low pre-diction RMSE values between 0.12 and 0.34, RF was also not able to predict unseen exper-iments successfully and generated nearly random results. As a result, it was concluded that the data mining tools have been successful for descriptive tasks like significance analysis and rule deduction, but failed in predicting the conversion, selectivity and yield.