摘要
特征选择是文本分类技术的一项关键技术,特征选择的质量决定了分类的性能。在分析现有特征选择方法的基础上,引入类词频概念,建立"文档—类—词"立方体。实验表明,这样的立方体模型能更全面、更客观刻画特征的本质,兼顾了特征的类内分散度更平均、类间集中度更集中。结合类词频选择的特征提高了文本分类能力。
Feature selection is a key technique of text classification,the quality of feature selection determines the performance of classification. Based on the analysis of the existing feature selection methods,introduce the concept of word frequency in class,establish a document-class-word cube. Experiments show that this cube model can be more comprehensive,more objectively describe the characteristics of the nature of the features,it give consideration to two or more things,such as internal dispersion more on average,external concentration more concentrated. Combined with the word frequency in class,choice feature to improve the performance of text classification.
出处
《计算机应用研究》
CSCD
北大核心
2014年第7期2024-2026,共3页
Application Research of Computers
基金
贵州省教育厅自然科学研究青年项目(黔教科20100095)
关键词
特征选择
类词频
立方体
feature selection
word frequency in class
cube