期刊文献+

基于惩罚的SVM和集成学习的非平衡数据分类算法研究 被引量:6

RESEARCH ON CLASSIFYING UNBALANCED DATA BASED ON PENALTY-BASED SVM AND ENSEMBLE LEARNING
下载PDF
导出
摘要 利用各类算法对非平衡数据进行处理已成为数据挖掘领域研究的热问题。针对非平衡数据的特点,在研究支持向量机的相关理论及K-SVM算法基础上,提出基于惩罚机制的PFKSVM(K-SVMbased on penalty factor)算法,克服K-SVM在最优分类面附近易发生错分的问题;并提出由重构采样层、基本训练层和综合判定层组成的集成学习模型。利用UCI公共数据集的实验验证了PFKSVM算法及集成模型在处理非平衡数据分类时的优势。 To process the unbalanced data with various algorithms has become a focus in data mining research. Aiming at the characteristic of the unbalanced data, on the basis of studying the related theory of support vector machines and the K-SVM algorithm, we present the penalty mechanism-based PFKSVM (SVM based on penalty factor) method to overcome the problem of K-SVM that it is prone to misclassification when nearby the optimal classification surface. Then, we propose an ensemble learning model composing of the reconstructed sampling layer, basic training layer and decision layer. The experiment using UCI public data sets verifies the predominance of PFKSVM algorithm and the ensemble model in processing the unbalanced data classification.
作者 刘进军
出处 《计算机应用与软件》 CSCD 北大核心 2014年第1期186-190,共5页 Computer Applications and Software
关键词 数据挖掘 支持向量机(SVM) 非平衡数据分类集成学习 Data mining Support vector machine(SVM) Unbalanced data classification Ensemble learning
  • 相关文献

参考文献20

  • 1Chawla N,Japkowicz N. Special Issue on Learning from Imbalanced Da-ta Sets[J].ACMSIGKDD Explorations Newsletter,2004,(01):1-6.
  • 2Maciej A M,Piotr A H. Training neural network classifiers for medical decision making:The effects of imbalanced datasets on classification performance[J].{H}NEURAL NETWORKS,2008,(02):427-436.
  • 3Maruthi P,Narendra D. Unbalanced Data Classification Using extreme outlier Elimination and Sampling Techniques for Fraud Detection[A].2007.511-516.
  • 4Li Y L,Zhu Y H,Yang P. Text Classification for Imbalanced Data Sets[J].Information Science and Engineering,2008,(20-22):778-781.
  • 5He H B,Edwardo A G. Learning from Imbalanced Data[J].{H}IEEE Transactions on Knowledge and Data Engineering,2009,(09):1263-1282.
  • 6Sui H F,Yang B R,Zhai Y. The problem of classification in imbalanced data sets in knowledge discovery[J].Computer Application and Sys-tem Modeling,2010.658-661.
  • 7Japkowicz N. The class imbalance problem:Significance and strategies[A].2000.111-117.
  • 8Estabrooks A,Japkowicz N. A Multiple Resampling Method for Learn-ing from Imbalanced Data Sets[J].{H}COMPUTATIONAL INTELLIGENCE,2004,(01):18-36.
  • 9Drummond C,Holte R. C4.5,clsss imbalance and cost sensitivity:why un-der-sampling beats over-sampling[A].2003.
  • 10Dai S,Zhang Y. Color image segmentation with watershed on color his-togram and Markov random fields[A].2003.

二级参考文献18

  • 1娄震,金忠,杨静宇.基于类条件置信变换的后验概率估计方法[J].计算机学报,2005,28(1):18-24. 被引量:6
  • 2张琦,吴斌,王柏.非平衡数据训练方法概述[J].计算机科学,2005,32(10):181-186. 被引量:10
  • 3业宁,王迪,窦立君.信息熵与支持向量的关系[J].广西师范大学学报(自然科学版),2006,24(4):127-130. 被引量:10
  • 4施建宇,潘泉,张绍武,邵壮超,姜涛.基于多特征融合的蛋白质折叠子预测[J].北京生物医学工程,2006,25(5):482-485. 被引量:2
  • 5Poyhonen S, Negrea M, Arkkio A, et o2. Support vector classificationfor fault diagnostics of an electrical machine[A]. Proc. of InL Conf. OnSignal Processing (ICSP'02)[C]. Beijing, August, 2002: 26-30.
  • 6Vapnik V 张学工译.统计学习理论的本质[M].北京:清华大学出版社,2000..
  • 7Shin K S, Lee T S, Kim H J. An Application of Support Vector Machines in Bankruptcy Prediction Model[J]. Expert Systems with Applications, 2005, 28(1): 127-135.
  • 8Holger K Olivier C, Bernhard S. Feature Selection for Support Vector Machines by Means of Genetic Algorithms[EB/OL]. (2003-03-30). http://ieeexplore.ieee.org/ie15/8840/27974/01250182. pdf.
  • 9Vapnik V, Chapelle O. Bounds on Error Expectation for Support Vector Machines[J]. Neural Computation, 2000, 12(9): 2013-2016.
  • 10Altman E, Haldeman R, Narayanan R Zeta Analysis: A New Model to Identify Bankruptcy Risk of Corporations[J]. Journal of Banking & Finance, 1977, 1(1): 29-54.

共引文献193

同被引文献52

  • 1陈丽,陈静.基于支持向量机和k-近邻分类器的多特征融合方法[J].计算机应用,2009,29(3):833-835. 被引量:14
  • 2侯汉清,薛春香.用于中文信息自动分类的《中图法》知识库的构建[J].中国图书馆学报,2005,31(5):82-86. 被引量:25
  • 3薛春香,夏祖奇,侯汉清.基于语料和基于标引经验的自动分类模式比较[J].南京农业大学学报(社会科学版),2005,5(4):85-92. 被引量:10
  • 4Vapnik V N. The Nature of Statistical Learning Theory[M]. New York: Springer, 1995.
  • 5Luo Y, Wu C M, Zhang Y. Facial Expression Recognition Based on Fusion Feature of PCA and LBP with SVM[J]. Optik-International Journal for Light and Electron Optics, 2013, 124(9): 2767-2770.
  • 6Xiao Y C, Wang H G, Zhang L, et al. Two Methods of Selecting Gaussian Kernel Parameters for One-class SVM and Their Application to Fault Detection[J]. Knowledge-Based Systems, 2014, 59(3): 75-84.
  • 7Zhong H M, Miao C Y, Shen Z Q, et al. Comparing the Learning Effectiveness of BP, ELM, I-ELM, and SVM for Corporate Credit Ratings[J]. Neurocomputing, 2014, 128(3): 285-295.
  • 8Marseguerra M. Early Detection of Gradual Concept Drifts by Text Categorization and Support Vector Machine Techniques: the Trio Algorithm[J]. Reliability Engineering & System Safety, 2014, 129(9):1-9.
  • 9Guyon I, Matic N, Vapnik V N. Discovering Information Patterns And Data Cleaning[M]. Cambridge: MIT Press, 1996.
  • 10Debruyne M. An Outlier Map for Support Vector Machine Classification[J]. The Annals of Applied Statistics, 2009, 3(4): 1566-1580.

引证文献6

二级引证文献48

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部