期刊文献+

基于和声搜索机制的特征选择与文本聚类分析

Feature selection and text clustering analysis based on harmony search
下载PDF
导出
摘要 针对文本信息特征冗余多、噪声大问题,提出基于和声搜索机制的文本特征选择算法。以词频逆文本频率指数为目标函数评估特征词条;在初始文档集中通过和声搜索的记忆考虑、纵向倾角调整和随机选择3种特征选择新解更新规则,迭代搜索最优特征子集;以最优特征子集为基础,以K均值进行文本聚类。利用4种典型文档数据集进行仿真实验,实验结果表明,该算法可以有效降低文本特征维度,聚类准确率更高。 Aiming at the problems of the redundancy and the big noise of features in text information,a text feature selection algorithm based on harmony search mechanism was proposed.The term frequency-inverse document frequency was used as an objective function to evaluate each text feature at the level of the document,and the original dataset was taken to obtain a new optimal feature subset by three update rules of new solutions,including the memory consideration,the longitudinal angle adjustment and the random selection in harmony search.Based on the optimal feature subset,K-mean was used to make text clustering.Simulation experiments were carried out using four typical text datasets on clustering test.The results show that,the proposed algorithm not only can effectively reduce the text feature dimension,but has higher accuracy of text clustering.
作者 王永刚 李靖 王文慧 曹传剑 王晓燕 WANG Yong-gang;LI Jing;WANG Wen-hui;CAO Chuan-jian;WANG Xiao-yan(College of General Education,Qingdao Huanghai University,Qingdao 266427,China;School of Data Science,Qingdao Huanghai University,Qingdao 266427,China;Teaching Department,Qingdao Huanghai University,Qingdao 266427,China;School of Intelligent Manufacturing,Qingdao Huanghai University,Qingdao 266427,China)
出处 《计算机工程与设计》 北大核心 2022年第2期472-478,共7页 Computer Engineering and Design
基金 山东省高等学校青创人才引育计划建设团队基金项目(201901)。
关键词 特征选择 文本聚类 和声搜索机制 K均值文本聚类 特征子集 feature selection text clustering harmony search mechanism K-mean text clustering feature subset
  • 相关文献

参考文献4

二级参考文献30

  • 1Guo Qinglin,Zhang Ming.Multi-documents automaticabstracting based on text clusteringand semantic analysis[J].Knowledge-Based Systems,2009,22(3):482-485.
  • 2Carretero-Campos C,Bernaola-Galvan P,Coronado A V.Improving statistical keyword detection in short texts:Entropic and clustering approaches[J].Physica A,2013,392(6):1481-1492.
  • 3Liu Wenyin,Quan Xiaojun,Feng Min.A short text modelingmethod combining semantic andstatistical information[J].Information Sciences,2010,180(20):4031-4041.
  • 4Cagnina L,Errecalde M,Ingaramo D.An ef ficient particleswarm optimization approach tocluster short texts[J].Information Sciences,2013,56(3):1-14.
  • 5Feng Xinyuan,Wei Jianguo,Lu Wenhuan.Word semanticsimilarity calculation based on domain knowledge andHowNet[J].Telkomnika Indonesian Journal of ElectricalEngineering,2014,12(2):1143-1148.
  • 6Wang Huiying,Liu Xiangwei.Study on frequent termset-based clustering algorithm[C].Proceedings of the 8thInternational Conference on Fuzzy Systems and KnowledgeDiscovery,2011:1182-1186.
  • 7Zhang Wen,Yoshida T,Tang Xijin.Text clustering usingfrequent itemsets[J].Knowledge-Based Systems,2010,256(67):379-388.
  • 8Li Xiangdong,Zhang Cheng.Research on enhancing theeffectiveness of the Chinese text automatic categorizationbased on ictclas segmentation method[C].Proceedingsof 2013 IEEE 4th International Conference on SoftwareEngineering and Service Science,2013:109-116.
  • 9贺涛,曹先彬,谭辉.基于免疫的中文网络短文本聚类算法[J].自动化学报,2009,35(7):896-902. 被引量:18
  • 10胡洋,王井东,俞能海,华先胜.一种基于成对约束的半监督最大间隔聚类算法[J].小型微型计算机系统,2010,31(5):932-936. 被引量:1

共引文献40

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部