期刊文献+

不同特征粒度在微博短文本分类中作用的比较研究 被引量:10

A Comparative Study on the Effects of Different Feature Granularity on Microblog Short Text Classification
原文传递
导出
摘要 【目的/意义】随着互联网产业的快速发展,各种社会化媒体应用应运而生,伴随着这些应用的发展,口语化短文本形式的信息也急速膨胀。如何从这些信息资源中挖掘出关键内容并实现自动分类已经成为文本挖掘领域的重要课题之一。【方法/过程】本文以微博为例,设置词和字两种特征粒度,选择信息增益、信息增益率、Word2vec和特征频度降低特征维度,重点探讨两种特征在口语化短文本分类中的特点和作用。【结果/结论】实验结果表明,对词特征进行筛选和提取之后的分类效果仍然不如字特征在微博文本分类中的表现。因此,在口语化短文本分类中选择字特征或许是一个较实用的、效果较好的方法。 [Purpose/significance]With the rapid development of the Internet industry,various social media applications have emerged.Along with the development of these applications,the information in the form of colloquial short texts has also expanded rapidly.How to mine the key content from these information resources and achieve automatic classification has become one of the important topics in the field of text mining.[Method/process]This paper takes Microblog as an example, sets the granularity of Word and character features and selects Information Gain,Information Gain Ratio,Word2vee and Feature Frequency to reduce the feature dimension,focusing on the characteristics and effects of the two features in colloquial short text classification.[Result/conclusion]The experimental results show that the classification effect after screening and extracting word features is still inferior to the performance of character features in the Microblog text classification.Therefore,choosing character features in colloquial text classification may be a more practical and effective method.
作者 刘小敏 王昊 李心蕾 邓三鸿 LIU Xiao-min;WANG Hao;LI Xin-lei;DENG San-hong(School of Information Management,Nanjing University,Nanjing 210023,China;Jiangsu Key Laboratory of Data Engineering and Knowledge Service,Nanjing 210023,China)
出处 《情报科学》 CSSCI 北大核心 2018年第12期126-133,共8页 Information Science
基金 国家自然科学基金项目(71503121) 南京大学"仲英青年学者"项目等的资助
关键词 特征粒度 短文本 口语化文本 特征降维 feature granularity short text colloquial text feature reduction
  • 相关文献

参考文献11

二级参考文献119

共引文献367

同被引文献123

引证文献10

二级引证文献133

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部