摘要
针对面向聚类的特征选择算法效率和效果无法兼顾,并且对高维数据适用度不高的问题,提出了一种基于邻域分析的加权特征选择算法ENFSA。该算法首先基于信息熵构建候选特征集,降低加权特征选择的候选特征维度,在此基础上采用邻域分析法评估特征冗余度和相关性,并根据评估结果更新特征子集和权值向量,不断迭代,直至特征权值向量趋于稳定。在10种典型数据集上的测试结果表明,与传统的特征选择算法相比,新的算法特征约简效率较好,能够明显提高数据集聚类效果,同时在特征维度较高的数据集上依然表现出很好的效果。
Aiming at the problem that the efficiency and performance of traditional feature selection are not compatible and it cannot be well applied to high-dimensional data, this paper proposed a neighborhood analysis based weighted feature selection algorithm(ENFSA). ENFSA created a candidate feature set based on information gain to reduce the number of dimensions. Then it assessed the redundancy and relevance of features based on neighborhood analysis and used them to update feature set and weight vector. This assessment and update process would be repeated until optimal result was obtained. Experimental resuits on 10 typical datasets show that this method has good efficiency and performance, and it do better on high-dimensional dataset than other algorithms.
出处
《计算机应用研究》
CSCD
北大核心
2015年第12期3596-3599,共4页
Application Research of Computers
基金
国家"863"计划资助项目(2012AA012704)
郑州市科技领军人才项目(131PLJRC644)
关键词
加权特征选择
聚类
信息熵
邻域分析
特征权值向量
weighted feature selection
clustering
information gain
neighborhood analysis
feature weight vector