摘要
了解国内文本挖掘领域的研究热点和趋势,对于掌握领域内容的发展变化及促进相关研究的进一步发展具有重要意义。首先,本文以CNKI数据库中1998—2017年的1155篇文本挖掘相关主题的研究文献为样本,以文章关键词的词频矩阵为数据,利用SPSS软件对其进行聚类分析。然后采用卡方统计抽取高关联度关键词对聚类结果进行解读,根据聚类结果将文本挖掘领域的文献从宏观上划分为13类,从而把握国内文本挖掘的研究热点与趋势。分析结果表明,国内关于文本挖掘的基础研究、文本大数据预处理、文本挖掘应用领域的研究是热点,有关关联规则、文本聚类、文本分类相关的应用研究文献数量较少,未来关于文本主题分析、文本大数据预处理、网络文本挖掘的研究可能成为新的趋势。
Understanding research hotspots and trends in the field of domestic text mining has immense significance in mastering the development and changes in domain content and promoting further development of the related research. First, this study uses the research literature of 1155 text mining related topics in CNKI database from 1998 to 2017 as the sample and the word frequency matrix of the article keywords as the data. It employs the SPSS software for cluster analy sis. Further, the chi-square statistics are used to extract high-degree keywords to interpret the clustering results. According to the clustering results, the literature in the text mining field is divided into 13 categories from the macroscopic level to grasp the research hotspots and trends of domestic text mining. The results show the following:(i) The research on basic re search of text mining, text big data preprocessing, and text mining application field are hot topics,(ii) the amount of ap plied research literature related to association rules, text clustering, and text classification is small, and (iii) text topic analy sis, text big data preprocessing, and web text mining research are likely to become new research trends in the future.
作者
谭章禄
彭胜男
王兆刚
Tan Zhanglu;Peng Shengnan;Wang Zhaogang(China University of Mining and Technology (Beijing) School of Management,Beijing 100083)
出处
《情报学报》
CSSCI
CSCD
北大核心
2019年第6期578-585,共8页
Journal of the China Society for Scientific and Technical Information
基金
国家自然科学基金项目“基于数据挖掘的煤矿安全可视化管理模型及图元体系研究”(61471362)
关键词
文本挖掘
聚类分析
研究热点
趋势
text mining
cluster analysis
research hotspot
trend