摘要
提出一种基于类别区分度和关联性分析的综合特征选择算法。利用类别区分度提取具有较强类别区分能力的特征词,降低特征空间的稀疏性,通过特征的关联性分析衡量特征与类别的相关性及特征之间的冗余度,选出具有类别代表性且相互之间不存在冗余的特征词。实验结果表明,该算法能有效提高分类器性能。
This paper proposes a syntaxic feature selection algorithm based on category discrimination degree and correlation analysis. The algorithm uses discrimination degree to extract the features that reveal larger differences among categories to reduce the sparsity of feature spaces, and emploies correlation analysis of features to measure relativity between features and categories and redundancy among features, so it can acquire the feature subsets which are more representative and have no redundancy between each other. Experimental results show that the proposed algorithm can improve the performance of the classifier effectively.
出处
《计算机工程》
CAS
CSCD
2012年第9期186-188,192,共4页
Computer Engineering
基金
国家自然科学基金资助项目(60873196)
甘肃省自然科学基金资助项目(1010RJZA022)
西北师范大学2010年第三期知识与创新工程科研骨干基金资助项目(nwnu-kjcxgc-03-67)
关键词
文本分类
特征选择
关联性分析
类别区分度
相关独立度
text categorization
feature selection
correlation analysis
category discrimination degree
relevant independence degree