Prediction of reaction yields using machine learning(ML)can help chemists select high-yielding reactions and provide prior experience before wet-lab experimenting to improve efficiency.However,the exploration of a mul...Prediction of reaction yields using machine learning(ML)can help chemists select high-yielding reactions and provide prior experience before wet-lab experimenting to improve efficiency.However,the exploration of a multicomponent organic reaction features many complex variables and limited number of experimental data,which are challenging for the application of ML.Herein,we perform yield prediction for the synthesis of 2-oxazolidones via Cu-catalyzed radical-type oxy-alkylation of allylamines and herteroaryl-methylamines with CO_(2),which is a three-component reaction.Using physicochemical descriptors as features to launch ML modelling,we find that XGBoost shows significantly improved performance over linear models and these features are effective for the yield prediction.Moreover,out-of-sample prediction indicates the application potential of the model.This study demonstrates great potential of regression-modelling-based ML in organic synthesis even with complex factors and a general small size of reaction data,which are generated from the classical research pattern of method for the inquiry of multicomponent reactions.展开更多
基金We thank the financial support from the National Natural Science Foundation of China(Nos.21775107,21822108)the Sichuan Science and Technology Program(20CXTD0112)the Fundamental Research Funds for the Central Universities.
文摘Prediction of reaction yields using machine learning(ML)can help chemists select high-yielding reactions and provide prior experience before wet-lab experimenting to improve efficiency.However,the exploration of a multicomponent organic reaction features many complex variables and limited number of experimental data,which are challenging for the application of ML.Herein,we perform yield prediction for the synthesis of 2-oxazolidones via Cu-catalyzed radical-type oxy-alkylation of allylamines and herteroaryl-methylamines with CO_(2),which is a three-component reaction.Using physicochemical descriptors as features to launch ML modelling,we find that XGBoost shows significantly improved performance over linear models and these features are effective for the yield prediction.Moreover,out-of-sample prediction indicates the application potential of the model.This study demonstrates great potential of regression-modelling-based ML in organic synthesis even with complex factors and a general small size of reaction data,which are generated from the classical research pattern of method for the inquiry of multicomponent reactions.