摘要
目的临床数据在分析时多存在不平衡性,即阳性数据和阴性数据不相等,如果不加以预处理会使分析结果产生偏倚。处理有偏性数据的方法多,但多数方法存在过拟合或丢失数据等缺点。方法本文介绍了SMOTE算法的原理和R语言具体实现方式,并用SMOTE算法处理真实临床数据作为应用实例。结果原始数据良恶性比率为1/3,经过SMOTE算法处理后,良恶性比率为1。结论 SMOTE算法可对不平衡数据进行有效纠偏。
Objective Unbalanced data which means inequality between positive and negative data, is a common problem in clinical data analysis, and this problem may result in bias. Methods for balancing data are various, yet some may over fit or lose data. Methods In this paper, SMOTE arithmetic and the application in R language were introduced briefly and we used SMOTE arithmetic for real unbalanced data. Results The ratio between benign and malignant cases was 1/3 in original data and the ratio was I in balanced data. Conclusions The SMOTE arithmetic has good performance in balancing data.
出处
《北京生物医学工程》
2012年第5期528-530,共3页
Beijing Biomedical Engineering
基金
国家自然科学基金(81172772)
北京市自然科学基金(4112015)
北京市属高等学校人才强教计划资助项目(PHR201007112)资助