摘要
特征选择是文本分类系统的核心步骤之一。然而现有的特征选择方法都是串行化的,应用于中文海量文本数据时时间效率较低,因此利用并行策略来提高特征选择的效率,已经成为研究的热点。详细设计了一个用于特征选择的并行遗传算法,该算法采用遗传算法搜索特征,利用并行策略评价特征子集,即将种群中个体的适应度计算并行在多个计算节点上同时进行,从而较快地获得较具代表性的特征子集。实验结果表明该方法是有效的。
Feature selection is one of the key steps in text classification system.However,most of existing feature selection methods are serial and are inefficient timely to be applied to Chinese massive text data sets,so it is a hotspot how to improve efficiency of feature selection by means of parallel strategy.It detailedly designs a Parallel Genetic Algorithm(PGA) which is used to select features.The algorithm uses genetic algorithm to search features and calculates fitness of feature subsets in multiple computing nodes at the same time,so can acquire quickly feature subsets which are more representative.Experimental results show that the method is effective.
出处
《计算机工程与应用》
CSCD
北大核心
2009年第22期107-110,217,共5页
Computer Engineering and Applications
基金
四川省科技计划项目(No.2008GZ0003)
四川省科技厅科技攻关项目(No.07GG006-014)
关键词
文本分类
特征选择
遗传算法
并行策略
text categorization
feature selection
Genetic Algorithm(GA)
parallel strategy