摘要
在文本分类过程中,为了提升特征过滤法的性能,基于相对判别准则(RDC),提出一种采用多变量形式的改进方法。首先,使用RDC度量计算每个特征的相关值;然后,考虑多个特征变量之间的依赖关系,使用Pearson关联以计算特征之间的关联值。同时,利用最小冗余和最大相关的概念,对冗余特征进行约简。最后,选出特征子集,作为后续分类的要素。所提方法在3个数据集上进行实验评价。结果表明,在大部分情况下,所提方法在精度、召回率和F度量方面的分类性能优于其他方法,且复杂度适中。
To improve the performance of feature filtering method in text classification,an improved method using multiple variable is proposed based on relative discriminant criterion(RDC).Firstly,the RDC measure is used to calculate the correlation value of each feature.Then,considering the dependency relationship between multiple feature variables,Pearson correlation is used to calculate the correlation value between features.At the same time,redundancy features are reduced by using the concepts of minimum redundancy and maximum correlation.Finally,feature subsets are selected as the elements of subsequent classification.The proposed method is evaluated experimentally on three data sets.The results show that in most cases,the classification performance of the proposed method is better than other methods in terms of accuracy,recall rate and F metric,and the complexity is moderate.
作者
董园园
DONG Yuanyuan(QILU Normal University,Jinan,250013,China)
出处
《网络新媒体技术》
2020年第2期29-36,共8页
Network New Media Technology
基金
山东省社会科学规划研究项目(17CTYJ03)。
关键词
文本分类
相对判别准则
多变量
关联值
冗余特征
特征子集
text classification
relative discriminant criterion
multiple variable
correlation value
redundancy features
feature subsets