不均衡故障诊断数据上的特征选择被引量：6

Feature Selection for Imbalanced Fault Diagnosis

下载PDF

导出

摘要不均衡数据在实际应用中广泛存在,它们已对机器学习领域构成了一个挑战,如何有效处理不均衡数据也成为目前的一个新的研究热点.在故障诊断数据集中,故障样本数通常比非故障样本数要少很多,由此引发了数据不均衡问题下故障诊断的问题.以往的研究很少关注这种数据不均衡问题对故障诊断的影响.此外,在故障数据集中有一些冗余甚至是不相关的特征,这些特征降低了学习器的泛化能力.为解决这类问题,提出了一种基于嵌入式特征选择的EasyEnsemble算法来解决故障诊断中的数据不均衡问题.在UCI数据集和柴油发动机数据集上的实验结果表明新算法提高了分类器在不均衡数据集上的分类性能和预报能力. There are many labeled data sets which have an unbalanced representation among the classes in them. When the imbalance is large,classification accuracy on the smaller class tends to be lower. In particular,when a class is of great interest but occurs relatively rarely such as cases of fraud, instances of disease, and so on, it is important to accurately identify it. Fault diagnosis on diesel engine is a difficult problem due to the complex structure of the engines and the presence of multi-excite sources. Class imbalance problem is also encountered in the fault diagnosis, which causes seriously negative effect on the performance of classifiers that assume a balanced distribution of classes. Though it is critical,few previous works paid attention to this class imbalance problem in the fault diagnosis of diesel engine. In imbalanced problems, some features are redundant and even irrelevant. These features will hurt the generalization performance of learning machines. Here we propose PREE （Prediction Risk based feature selection for EasyEnsemble） to solve the class imbalanced problem in the fault diagnosis of diesel engine. Experimental results on UCI data sets and diesel engine data set show that PREE improves the classification performance and prediction ability on the imbalanced dataset.

作者刘天羽李国正尤鸣宇

机构地区上海电机学院电气学院同济大学控制科学与工程系

出处《小型微型计算机系统》 CSCD 北大核心 2009年第5期924-927,共4页 Journal of Chinese Computer Systems

基金国家自然科学基金项目(20503105 60873129)资助上海市科委创新行动计划重大项目(07DZ19726)资助上海市青年科技启明星计划项目(08QA1403200)资助上海高校选拔培养优秀青年教师科研专项基金项目(sdj-07003)资助

关键词特征选择不均衡数据集集成学习故障诊断柴油发动机 feature selection imbalanced data sets ensemble learning fault diagnosis diesel engine

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献9

1Ezawa K J,Sngh M,Norton S W.Learning goal oriented bayesian networks for telecommunications management[C].In Proceedings of the 13th International Conference on Machine Learning.San Fransisco:Morgan Kaufmann,1996,139-147.
2林智勇,郝志峰,杨晓伟.不平衡数据分类的研究现状[J].计算机应用研究,2008,25(2):332-336. 被引量：46
3Chawlanv,Bowyer K W,Hall L O,et al.SMOTE:synthetic minority over-sampling technique[J].Joumal of Artificial Intelligence Research,2002,16:321-357.
4Liu X Y,Wu J,Zhou Z H.Exploratory under-sampling for class-imbalance learning[A].In Proceedings of International Conference on Data Mining[C],IEEE Press,2006,965-969.
5Guyon I,Elissee A.An introduction to variable and feature selection[J].Journal of Machine learning Research,2003,(3):1157-1182.
6Moody J,Utans J.Principled architecture selection for neural networks:application to corporate bond rating prediction[J].In NIPS 4,Morgan Kaufmann Publishers,Inc,1992,683-690.
7Hand D J.Construction and assessment of classification rules[M].Chichester,John Wiley and Sons,1997.
8Blake C,Keogh E,Merz C J.UCI repository of machine learning databases[EB/OL].http://www.ics.uci.edu/mlearn/ MLRepository.html.Department of Information and Computer Science,University of California,Irvine,California,1998.
9Shen L,Tay F E H,Qu L,et al.Fault diagnosis using rough sets theory[J].Computers in Industry,2000,43:61-72.

二级参考文献55

1KUBAT M, HOLTE R C, MATWIN S. Machine learning for the detection of oil spills in satellite radar images[ J] . Machine Learning, 1998, 30 ( 2- 3) : 195 -215 .
2PHUA C, ALAHAKOON D. Minority report in fraud detection: classication of skewed data[ J] . SIGKDD Exp lorations, 2004 , 6 ( 1 ) :50- 59 .
3PEREZ J M, MUGUERZA J, ARBELAITZ O, et al. Consolidated tree classifier learning in a car insurance fraud detection domain with class imbalance[ C] / / Proc of the 3rd International Conference on Advances in Pattern Recognition( ICAPR’05) . 2005 : 381- 389.
4CASTILLO M D del, SERRANO J I. A multistrategy approach for digital text categorization from imbalanced documents [ J] . SIGKDD Exploration s, 2004, 6 ( 1) : 70- 79 .
5ZHENG Zhao-hui, WU X, SRIHARI R K. Feature selection for text categorization on imbalanced data [ J] . SIGKDD Explorat ions,2004, 6 ( 1) : 80 - 89.
6COHEN G, HILARIO M, SAX H, et al. Data imbalance in surveillance of nosocomial infections[ C] / / Proc of the 4th International Symposium on Medical Data Analysis ( ISMDA’03 ) . Berlin: [ s. n. ] ,2003: 109-117 .
7CHEN Jian-xun, CHENG T H, CHAN A L F, et al. An application of classification analysis for skewed class distribution in therapeutic drug monitoring the case of vancomycin[ C] / / Proc of Workshop on Medical Information Systems ( IDEAS-DH’04 ) . Beijing: [ s. n. ] ,2004: 35 - 39.
8YOON K, KWEK S. An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in functional genomics[ C] / / Proc of the 5th International Conference on Hybrid Intelligent Systems( HIS’05 ) . Rio de Janeiro: [ s. n. ] , 2005 : 303-308.
9RADIVOJAC P, KORAD U, SIVALINGAM K M, et al. Learning from class-imbalanced data in wireless sensor networks[ C] / /Proc of Vehicular Technology Conference( VTC’03-Fall) . Orlando: [ s. n. ] ,2003: 3030- 3034 .
10JAPKOWICZ N, STEPHEN S. The class imbalance problem: a systematic study[ J] . Intelligent Data Analysis, 2002, 6 ( 5 ) : 203-231.

共引文献45

1贾志洋,高炜,王勇刚.结合信息检索技术的半监督文本分类方法[J].苏州大学学报（自然科学版）,2012,28(1):34-39. 被引量：1
2周舒冬,李丽霞,郜艳晖,徐英,叶小华,张丕德.加权Fisher线性判别法在非平衡医学数据集中的应用[J].数理医药学杂志,2009,22(1):59-61. 被引量：2
3刘海涛,黄敏,朱启兵,王聪.基于支持向量机的不平衡数据分类算法的研究[J].计算机应用研究,2009,26(8):2874-2875. 被引量：8
4文传军,詹永照.基于样本投影分布的平衡不平衡数据集分类[J].计算机应用研究,2009,26(8):3131-3133. 被引量：2
5刘天羽,李国正.大脑胶质瘤诊断中不均衡问题的特征选择[J].广西师范大学学报（自然科学版）,2009,27(3):101-104. 被引量：1
6程华,房一泉.基于聚类分析的网络流量高斯混合模型[J].华东理工大学学报（自然科学版）,2010,36(2):255-260. 被引量：2
7刘天羽,李国正.齿轮故障不均衡分类问题的研究[J].计算机工程与应用,2010,46(20):146-148. 被引量：2
8尤鸣宇,陈燕,李国正.不均衡问题中的特征选择新算法:Im-IG[J].山东大学学报（工学版）,2010,40(5):123-128. 被引量：9
9王瑞伟,李志华.离群数据规则挖掘的决策树构造方法[J].计算机工程与设计,2011,32(5):1781-1784.
10陶新民,童智靖,刘玉,付丹丹.基于ODR和BSMOTE结合的不均衡数据SVM分类算法[J].控制与决策,2011,26(10):1535-1541. 被引量：22

同被引文献60

1宋枫溪,高林.文本分类器性能评估指标[J].计算机工程,2004,30(13):107-109. 被引量：33
2王承忠.实验室间比对的能力验证及稳健统计技术第四讲　能力验证试样的均匀性和稳定性检验[J].理化检验（物理分册）,2004,40(10):533-538. 被引量：34
3邬剑明,王俊峰.基于神经网络的煤层自然发火的非线性预测[J].中国安全科学学报,2004,14(5):11-13. 被引量：12
4YOON K, KWEK S. A data reduction approach for resolving the imbalanced data issue in functional genomics [ J ]. Neural Comput & Applic, 2007 (16) :295-306.
5ZHENG Zhaohui, WU Xiaoyun, ROHINI Srihari. Feature selection for text categorization on imbalanced data [J]. SIGKDD Explorations, 2004, 6( 1 ) :80-89.
6JIANG Shengyi, WANG Lianxi. Unsupervised feature selection based on clustering [ C ]//Proceedings of IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA). Changsha: IEEE, 2010: 263-270.
7YU L, LIU H. Efficient feature selection via analysis of relevance and redundancy [J]. Journal of Machine Learning Research, 2004, 5 : 1205-1224.
8TSYMBAL A, PECHENIZKIY M, CUNNINGHAM P. Sequential genetic search for ensemble feature selection C ]//Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, San Francisco: Morgan Kaufmann, 2005: 877-882.
9LIU X Y, WU J, ZHOU Z H. Exploratory under-sampiing for class-imbalance learning [ J ]. IEEE Transactions on Systems, Man and Cybernetics-part B, 2009, 39(2) :539-550.
10ASUNCION A, NEWMAN D. UCI repository of machine learning databases [DB/OL ]. [ 2009-04-03 ]. http ://www. its. u ci. edu/-mlearn/MLRep-ository, html.

引证文献6

1李霞,王连喜,蒋盛益.面向不平衡问题的集成特征选择[J].山东大学学报（工学版）,2011,41(3):7-11. 被引量：5
2曹鹏,栗伟,赵大哲.面向不均衡数据集的ARSGOS算法[J].小型微型计算机系统,2014,35(4):818-823. 被引量：3
3尹华,胡玉平.基于随机森林的不平衡特征选择算法[J].中山大学学报（自然科学版）,2014,53(5):59-65. 被引量：33
4汪庆华,刘江炜,张兰兰.交叉验证K近邻算法分类研究[J].西安工业大学学报,2015,35(2):119-124. 被引量：17
5赵琳琳,温国锋,邵良杉.不均衡数据下的采空区煤自燃PCA-AdaBoost预测模型[J].中国安全科学学报,2018,28(3):74-78. 被引量：7
6刘金平,周嘉铭,贺俊宾,唐朝晖,徐鹏飞,张国勇.面向不均衡数据的融合谱聚类的自适应过采样法[J].智能系统学报,2020,15(4):732-739. 被引量：8

二级引证文献72

1田之魁,王东军,李生启,关媛媛,孙璇,朱青青,王泓午.一种糖尿病足Wagner分级的舌图像识别方法[J].世界科学技术-中医药现代化,2023,25(4):1442-1446. 被引量：2
2马立人,蒋中华.生产生物芯片及相关设备的厂商及服务及内容[J].现代科学仪器,2000,17(1):12-18. 被引量：6
3姚旭,王晓丹,张玉玺,薛爱军.基于正则化互信息和差异度的集成特征选择[J].计算机科学,2013,40(6):225-228. 被引量：3
4章潇俪,薛河儒,郜晓晶,周艳青.基于多特征融合与RF的牛乳体细胞分类与识别[J].内蒙古农业大学学报（自然科学版）,2018,39(6):87-92. 被引量：2
5尹华,胡玉平.基于随机森林的不平衡特征选择算法[J].中山大学学报（自然科学版）,2014,53(5):59-65. 被引量：33
6左军,周灵,孙亚民.分级在线自组织学习的GD-FNN算法研究[J].中山大学学报（自然科学版）,2015,54(3):26-29.
7赵永彬,陈硕,刘明,曹鹏.基于置信度代价敏感的支持向量机不均衡数据学习[J].计算机工程,2015,41(10):177-180. 被引量：8
8罗超.面向高维数据的随机森林算法优化探讨[J].商,2016,0(4):207-207. 被引量：1
9阚红星,张璐瑶,董昌武.一种2型糖尿病中医证型的舌图像识别方法[J].中国生物医学工程学报,2016,35(6):658-664. 被引量：31
10刘树慧,王顺芳.基于特征融合和有监督局部保持投影的蛋白质亚核定位[J].计算机应用与软件,2017,34(2):251-255. 被引量：1

1张煜东,霍元铠,吴乐南,董正超.降维技术与方法综述[J].四川兵工学报,2010,31(10):1-7. 被引量：29
2林智勇,郝志峰,杨晓伟.不平衡数据分类的研究现状[J].计算机应用研究,2008,25(2):332-336. 被引量：46
3吕成戍,王维国.一种基于混合策略的推荐系统托攻击检测方法[J].计算机工程与科学,2013,35(8):174-179. 被引量：1
4闫鹏,郑雪峰,朱建勇,肖赟泓.一种基于嵌入式特征选择的垃圾邮件过滤模型[J].小型微型计算机系统,2009,30(8):1616-1620. 被引量：13
5刘天羽,李国正.滚动轴承故障诊断中数据不均衡问题的研究[J].计算机工程与科学,2010,32(5):150-153. 被引量：7
6刘天羽,李国正.齿轮故障不均衡分类问题的研究[J].计算机工程与应用,2010,46(20):146-148. 被引量：2
7朱明,陶新民.基于随机下采样和SMOTE的不均衡SVM分类算法[J].信息技术,2012,36(1):39-43. 被引量：13
8夏丽莎,杨玉英,方华京.基于EasyEnsemble的化工过程故障诊断性能改进[J].控制理论与应用,2017,34(1):49-53. 被引量：10
9刘以广.三态编/解码器专用芯片在多路远距离测控系统中的应用[J].辽宁教育行政学院学报,2003,20(9):70-71.
10张燕,张晨光,张夏欢.平衡化图半监督学习方法[J].系统科学与数学,2016,36(8):1107-1118.

小型微型计算机系统

2009年第5期

浏览历史

内容加载中请稍等...

不均衡故障诊断数据上的特征选择被引量：6

参考文献9

二级参考文献55

共引文献45

同被引文献60

引证文献6

二级引证文献72

相关作者

相关机构

相关主题

浏览历史

不均衡故障诊断数据上的特征选择 被引量：6

参考文献9

二级参考文献55

共引文献45

同被引文献60

引证文献6

二级引证文献72

相关作者

相关机构

相关主题

浏览历史

不均衡故障诊断数据上的特征选择被引量：6