期刊文献+

SMOTE算法在不平衡数据中的应用 被引量:18

Application of SMOTE arithmetic for unbalanced data
下载PDF
导出
摘要 目的临床数据在分析时多存在不平衡性,即阳性数据和阴性数据不相等,如果不加以预处理会使分析结果产生偏倚。处理有偏性数据的方法多,但多数方法存在过拟合或丢失数据等缺点。方法本文介绍了SMOTE算法的原理和R语言具体实现方式,并用SMOTE算法处理真实临床数据作为应用实例。结果原始数据良恶性比率为1/3,经过SMOTE算法处理后,良恶性比率为1。结论 SMOTE算法可对不平衡数据进行有效纠偏。 Objective Unbalanced data which means inequality between positive and negative data, is a common problem in clinical data analysis, and this problem may result in bias. Methods for balancing data are various, yet some may over fit or lose data. Methods In this paper, SMOTE arithmetic and the application in R language were introduced briefly and we used SMOTE arithmetic for real unbalanced data. Results The ratio between benign and malignant cases was 1/3 in original data and the ratio was I in balanced data. Conclusions The SMOTE arithmetic has good performance in balancing data.
出处 《北京生物医学工程》 2012年第5期528-530,共3页 Beijing Biomedical Engineering
基金 国家自然科学基金(81172772) 北京市自然科学基金(4112015) 北京市属高等学校人才强教计划资助项目(PHR201007112)资助
关键词 SMOTE 不平衡数据 临床数据 SMOTE unbalanced data clinical data
  • 相关文献

参考文献5

  • 1Wang H, Guo XH, Jia ZW et al. Multilevel binomial logistic prediction model for malignant pulmonary nodules based on texture features of CT image[ J]. European Journal of Radiology, 2010, 74:124 -129.
  • 2Guo XH, Sun Tao, Wu HF, et al. Support Vector Machine Prediction Model of Early-stage Lung Cancer Based on Curvelet Transform to Extract Texture Features of CT[ J]. World Academy of Science, Engineering and Technology, 2010,71 : 333 -337.
  • 3Francisco FN,Cesar HM, Pedro AG. A dynamic over-sampling procedure based on sensitivity or multi-class problems [ J ]. Pattern Recognition, 2011, 44 : 1821 - 1833.
  • 4Alberto F, Maria J, Francisco H. On the influence of an adaptive inference system in fuzzy rule based classification systems for imbalanced data-sets [J]. Expert Systems with Applications, 2009, 36 : 9805 -9812.
  • 5Chawla NV, Bowyer KW, Hall LO, et al. Smote: synthetic minority over-sampling technique [ J ], Journal of Artificial Intelligence Research, 2002,16 : 321 -357.

同被引文献157

引证文献18

二级引证文献71

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部