期刊文献+

一种基于类别分布的增量特征选择算法 被引量:1

An Algorithm of Incremental Feature Selection Based on Category Distribution
下载PDF
导出
摘要 样本数量分布不平衡时,特征的分布同样会不平衡。大类别中经常出现的特征,在小类别中很少出现或者根本不出现,使得分类器被大类别所淹没,小类别的识别率很低。为此,根据数据的类别分布提出一种基于差异系数的增量特征选择算法CVIFS(Coefficient Variance-based Incremental Feature Selection),选取最具有区分能力的特征,提高小类别的识别率,使用区间估计检测概念漂移。经实验验证,该算法处理偏斜数据流时优于信息增益,具有较低的均衡误差率(Balanced Error Rate BER)。 The distribution of sample size is very uneven, and the feature distribution of sample will be un- even too. Classifier is submerged by the majority classes easily and the minority classes are hardly distinguished, because the features which often appear in the majority classes hardly appear in the minority classes or even do not occur. In this paper, the method for discovering concept drifting on imbalanced data streams and CVIFS (Coefficient Variance-based Incremental Feature Selection) algorithm are proposed according to the characteris- tics of imbalaneed classification problems. The interval estimation is used to detect concept drifting. Experimen- tal study on Moving Hyperplane dataset shows that the proposed algorithm has lower BER (Balanced Error Rate)than Information Gain on imbalaneed data streams with concept drifting.
出处 《宿州学院学报》 2014年第11期75-78,共4页 Journal of Suzhou University
基金 安徽省高校自然科学研究项目"云计算环境下信息服务交互信任管理的关键问题研究"(KJ2013Z281) 淮北师范大学青年科研项目"基于类别分布的增量特征选择算法研究"(2014xq012) 淮北师范大学青年自然科学研究项目"面向云服务的交互信任模型构建与信任实体评价研究"(700693)
关键词 概念漂移 偏斜分布 差异系数 信息增益 concept drifting imbalanced distribution coefficient variance information gain
  • 相关文献

参考文献12

  • 1NiteshV. Chawla,Nathalie Japkowicz ? Aleksander Kolcz.Editorial : Special Issue on Learning from Imbalanced DataSets [J], ACM SIGKDD Exploration newsletter, 2004,6(1):1-6.
  • 2FormanG. An extensive empirical study of feature selec-tion metrics for test classification [J]. Journal of MachineLearning Research,2003(3) : 1289-1305.
  • 3MladenicD,Grobelnik M. Feature selection for unbalancedclass distribution and Move Bayes[C]//Proceedings of six-teenth International Conference on Machine Learning(ICML 1999). Bled Slovenia,1999:258-267.
  • 4周茜,赵明生,扈旻.中文文本分类中的特征选择研究[J].中文信息学报,2004,18(3):17-23. 被引量:165
  • 5YangY? Pedersen J O. A Comparative Study on FeatureSelection in Text Categorization [C]//Proceedings of thefourteenth International Conference on Machine Learning(ICML 1997). Mashville Tennessee USA?1997:412-420.
  • 6ZhengZ*Wu X,Srihari R. Feature Selection for Text Cat-egorization on Imbalanced Data[J]. ACM SIGKDD Ex-plorations newsletter, 2004 (1) : 80-89.
  • 7Zheng Z, Srihari R. Optimally Combining Positive andNegative Features for Text Categorization[C]//Proceed-ings of the ICML,03 Workshop on Learning from Imbal-anced Data Sets. Washington DC USA,2003:1-8.
  • 8ChenX?Michael Wasikowski. FAST: A ROC-based Fea-ture Selection Metric for Small Samples and ImbalancedData Classification Problems [ C ]//KDD 1 08. NevadaUSA, 2008:124-132.
  • 9靖红芳,王斌,杨雅辉,徐燕.基于类别分布的特征选择框架[J].计算机研究与发展,2009,46(9):1586-1593. 被引量:18
  • 10Wang K, Bunjira Makond * Wang K. An Improved Sur-vivability Prognosis of Breast Cancer by Using Samplingand Feature Selection Technique to Solve Imbalanced Pa-tient Classification Data[J]. BMC Medical Informaticsand Decision Making?2013 : 1-14.

二级参考文献27

  • 1曾雪强,王明文,陈素芬.一种基于潜在语义结构的文本分类模型[J].华南理工大学学报(自然科学版),2004,32(z1):99-102. 被引量:27
  • 2徐燕,李锦涛,王斌,孙春明,张森.不均衡数据集上文本分类的特征选择研究[J].计算机研究与发展,2007,44(z2):58-62. 被引量:20
  • 3刘桃,刘秉权,徐志明,王晓龙.领域术语自动抽取及其在文本分类中的应用[J].电子学报,2007,35(2):328-332. 被引量:31
  • 4Mladenic D, Grobelnik M. Feature selection for unbalanced class distribution and Naive Bayes [C]//Proc of ICML'09. San Francisco: Morgan Kaufmann, 1999:258-267.
  • 5Yang Y, Pedersen J O. A comparative study on feature selection in text categorization [C] // Proc of ICML'97. San Francisco: Morgan Kaufmann, 1997: 412-420.
  • 6Yan J, Liu N, Zhang B, et al. OCFS: Optimal orthogonal centroid feature selection for text categorization [C]//Proc of SIGIR'05. New York: ACM, 2005: 122-129.
  • 7Zheng Z, Wu X, Srihari R. Feature selection for text categorization on imbalanced data [C] //Proc of ACM SIGKDD Explorations Newsletter. New York: ACM, 2004: 80-89.
  • 8Li S, Zong C. A new approach to feature selection for text categorization[C]//Proc of IEEE NLP-KE. Beijing: Beijing University of Posts and Telecommunications Press, 2005: 626-630.
  • 9How B C, Narayanan K. An empirical study of feature selection for text categorization based on term weightage [C] //Proc of IEEE/WIC/ACM WI. Washington: IEEE, 2004: 599-602.
  • 10Yang Y, Liu X. A re-examination of text categorization methods [C]// Proc of ACM SIGIR. New York: ACM, 1999:42-49.

共引文献181

同被引文献11

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部