摘要
目的探讨随机森林对精神分裂症患者和健康对照的血清代谢组学数据的分类能力,并筛选出差异代谢物。方法病例组为50例精神分裂症患者,对照组为62例健康个体,收集他们的血清进行代谢组学检测,然后用随机森林对数据进行分类,用OOB误差率估计、五折交叉验证评价分类效果,借助随机森林中变量重要性评分(VIM)获得重要的差异代谢物。结果随机森林对病例组和对照组的血清代谢组学数据分类效果较好。病例组错分率为4.0%,对照组错分率为1.6%。OOB误差率估计为2.68%,五折交叉验证ROC曲线下面积为0.99,并根据VIM筛选出15个重要的差异代谢物。结论将液相色谱-质谱代谢组学技术与随机森林相结合,能够筛选出有潜在临床应用价值的代谢物,可用于代谢组学研究。
Objective To explore the classification ability of random forest in the serum metabolic profiling of schizo-phrenia patients and healthy controls and to select significant metabolites.Methods The case group consisted of 50 patients with schizophrenia and control group consisted of 62 healthy individuals.The serum samples of case and control groups were collected and detected by RRLC-QTOF/MS platform.Random forest was used to classify the serum metabol-ic data in case and control groups.OOB estimate of error rate and 5 fold cross validation were used to evaluate the classi-fication ability.In addition,variable importance measure of random forest was adopted to select important metabolites. Results Schizophrenia and control serum metabolic data could be classified well using the method of random forest.The misclassification rates in case and control groups were 4.0% and 1.6% respectively,OOB estimate of error rate was 2.68%,and the area under the curve of ROC was 0.99.Furthermore,15 important metabolites were selected according to variable importance measure.Conclusion The combination of liquid chromatography-mass spectrum technology with random forest can select metabolites with potential clinical application value,and be used in the study of metabolomics.
出处
《山东大学学报(医学版)》
CAS
北大核心
2015年第2期92-96,共5页
Journal of Shandong University:Health Sciences
基金
国家自然科学基金(81273177)
山东省自然科学基金(ZR2013HQ056)
关键词
精神分裂症
代谢组学
随机森林
分类
变量筛选
Schizophrenia
Metabolomics
Random forest
Classification
Variable selection