期刊文献+

不均衡故障诊断数据上的特征选择 被引量:6

Feature Selection for Imbalanced Fault Diagnosis
下载PDF
导出
摘要 不均衡数据在实际应用中广泛存在,它们已对机器学习领域构成了一个挑战,如何有效处理不均衡数据也成为目前的一个新的研究热点.在故障诊断数据集中,故障样本数通常比非故障样本数要少很多,由此引发了数据不均衡问题下故障诊断的问题.以往的研究很少关注这种数据不均衡问题对故障诊断的影响.此外,在故障数据集中有一些冗余甚至是不相关的特征,这些特征降低了学习器的泛化能力.为解决这类问题,提出了一种基于嵌入式特征选择的EasyEnsemble算法来解决故障诊断中的数据不均衡问题.在UCI数据集和柴油发动机数据集上的实验结果表明新算法提高了分类器在不均衡数据集上的分类性能和预报能力. There are many labeled data sets which have an unbalanced representation among the classes in them. When the imbalance is large,classification accuracy on the smaller class tends to be lower. In particular,when a class is of great interest but occurs relatively rarely such as cases of fraud, instances of disease, and so on, it is important to accurately identify it. Fault diagnosis on diesel engine is a difficult problem due to the complex structure of the engines and the presence of multi-excite sources. Class imbalance problem is also encountered in the fault diagnosis, which causes seriously negative effect on the performance of classifiers that assume a balanced distribution of classes. Though it is critical,few previous works paid attention to this class imbalance problem in the fault diagnosis of diesel engine. In imbalanced problems, some features are redundant and even irrelevant. These features will hurt the generalization performance of learning machines. Here we propose PREE (Prediction Risk based feature selection for EasyEnsemble) to solve the class imbalanced problem in the fault diagnosis of diesel engine. Experimental results on UCI data sets and diesel engine data set show that PREE improves the classification performance and prediction ability on the imbalanced dataset.
出处 《小型微型计算机系统》 CSCD 北大核心 2009年第5期924-927,共4页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(20503105 60873129)资助 上海市科委创新行动计划重大项目(07DZ19726)资助 上海市青年科技启明星计划项目(08QA1403200)资助 上海高校选拔培养优秀青年教师科研专项基金项目(sdj-07003)资助
关键词 特征选择 不均衡数据集 集成学习 故障诊断 柴油发动机 feature selection imbalanced data sets ensemble learning fault diagnosis diesel engine
  • 相关文献

参考文献9

  • 1Ezawa K J,Sngh M,Norton S W.Learning goal oriented bayesian networks for telecommunications management[C].In Proceedings of the 13th International Conference on Machine Learning.San Fransisco:Morgan Kaufmann,1996,139-147.
  • 2林智勇,郝志峰,杨晓伟.不平衡数据分类的研究现状[J].计算机应用研究,2008,25(2):332-336. 被引量:46
  • 3Chawlanv,Bowyer K W,Hall L O,et al.SMOTE:synthetic minority over-sampling technique[J].Joumal of Artificial Intelligence Research,2002,16:321-357.
  • 4Liu X Y,Wu J,Zhou Z H.Exploratory under-sampling for class-imbalance learning[A].In Proceedings of International Conference on Data Mining[C],IEEE Press,2006,965-969.
  • 5Guyon I,Elissee A.An introduction to variable and feature selection[J].Journal of Machine learning Research,2003,(3):1157-1182.
  • 6Moody J,Utans J.Principled architecture selection for neural networks:application to corporate bond rating prediction[J].In NIPS 4,Morgan Kaufmann Publishers,Inc,1992,683-690.
  • 7Hand D J.Construction and assessment of classification rules[M].Chichester,John Wiley and Sons,1997.
  • 8Blake C,Keogh E,Merz C J.UCI repository of machine learning databases[EB/OL].http://www.ics.uci.edu/mlearn/ MLRepository.html.Department of Information and Computer Science,University of California,Irvine,California,1998.
  • 9Shen L,Tay F E H,Qu L,et al.Fault diagnosis using rough sets theory[J].Computers in Industry,2000,43:61-72.

二级参考文献55

  • 1KUBAT M, HOLTE R C, MATWIN S. Machine learning for the detection of oil spills in satellite radar images[ J] . Machine Learning, 1998, 30 ( 2- 3) : 195 -215 .
  • 2PHUA C, ALAHAKOON D. Minority report in fraud detection: classication of skewed data[ J] . SIGKDD Exp lorations, 2004 , 6 ( 1 ) :50- 59 .
  • 3PEREZ J M, MUGUERZA J, ARBELAITZ O, et al. Consolidated tree classifier learning in a car insurance fraud detection domain with class imbalance[ C] / / Proc of the 3rd International Conference on Advances in Pattern Recognition( ICAPR’05) . 2005 : 381- 389.
  • 4CASTILLO M D del, SERRANO J I. A multistrategy approach for digital text categorization from imbalanced documents [ J] . SIGKDD Exploration s, 2004, 6 ( 1) : 70- 79 .
  • 5ZHENG Zhao-hui, WU X, SRIHARI R K. Feature selection for text categorization on imbalanced data [ J] . SIGKDD Explorat ions,2004, 6 ( 1) : 80 - 89.
  • 6COHEN G, HILARIO M, SAX H, et al. Data imbalance in surveillance of nosocomial infections[ C] / / Proc of the 4th International Symposium on Medical Data Analysis ( ISMDA’03 ) . Berlin: [ s. n. ] ,2003: 109-117 .
  • 7CHEN Jian-xun, CHENG T H, CHAN A L F, et al. An application of classification analysis for skewed class distribution in therapeutic drug monitoring the case of vancomycin[ C] / / Proc of Workshop on Medical Information Systems ( IDEAS-DH’04 ) . Beijing: [ s. n. ] ,2004: 35 - 39.
  • 8YOON K, KWEK S. An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in functional genomics[ C] / / Proc of the 5th International Conference on Hybrid Intelligent Systems( HIS’05 ) . Rio de Janeiro: [ s. n. ] , 2005 : 303-308.
  • 9RADIVOJAC P, KORAD U, SIVALINGAM K M, et al. Learning from class-imbalanced data in wireless sensor networks[ C] / /Proc of Vehicular Technology Conference( VTC’03-Fall) . Orlando: [ s. n. ] ,2003: 3030- 3034 .
  • 10JAPKOWICZ N, STEPHEN S. The class imbalance problem: a systematic study[ J] . Intelligent Data Analysis, 2002, 6 ( 5 ) : 203-231.

共引文献45

同被引文献60

  • 1宋枫溪,高林.文本分类器性能评估指标[J].计算机工程,2004,30(13):107-109. 被引量:33
  • 2王承忠.实验室间比对的能力验证及稳健统计技术 第四讲 能力验证试样的均匀性和稳定性检验[J].理化检验(物理分册),2004,40(10):533-538. 被引量:34
  • 3邬剑明,王俊峰.基于神经网络的煤层自然发火的非线性预测[J].中国安全科学学报,2004,14(5):11-13. 被引量:12
  • 4YOON K, KWEK S. A data reduction approach for resolving the imbalanced data issue in functional genomics [ J ]. Neural Comput & Applic, 2007 (16) :295-306.
  • 5ZHENG Zhaohui, WU Xiaoyun, ROHINI Srihari. Feature selection for text categorization on imbalanced data [J]. SIGKDD Explorations, 2004, 6( 1 ) :80-89.
  • 6JIANG Shengyi, WANG Lianxi. Unsupervised feature selection based on clustering [ C ]//Proceedings of IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA). Changsha: IEEE, 2010: 263-270.
  • 7YU L, LIU H. Efficient feature selection via analysis of relevance and redundancy [J]. Journal of Machine Learning Research, 2004, 5 : 1205-1224.
  • 8TSYMBAL A, PECHENIZKIY M, CUNNINGHAM P. Sequential genetic search for ensemble feature selection C ]//Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, San Francisco: Morgan Kaufmann, 2005: 877-882.
  • 9LIU X Y, WU J, ZHOU Z H. Exploratory under-sampiing for class-imbalance learning [ J ]. IEEE Transactions on Systems, Man and Cybernetics-part B, 2009, 39(2) :539-550.
  • 10ASUNCION A, NEWMAN D. UCI repository of machine learning databases [DB/OL ]. [ 2009-04-03 ]. http ://www. its. u ci. edu/-mlearn/MLRep-ository, html.

引证文献6

二级引证文献72

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部