摘要
对具有长度短、结构复杂以及变形词多等特点的短文本倾向性分类进行深入研究,目的是提高倾向性分类的准确性和效率。以HowNet的情感词典为基础,提出一个微博新词发现算法,构建微博情感词典。在对文本进行分句、分词、标注、情感处理等后,构建一个自动机来计算短文本情感倾向性。为了客观评价该方法,选择基于HowNet的分类方法、基于SVM的分类方法进行比较性实验。实验结果表明提出的方法在一般文本分类上与SVM效果类似,在短文本上则具有明显的优势。同时该方法在效率上也具有突出优势。
In this paper we carry out thorough study on classifying the tendency of short Chinese texts with the characteristics of short length, complex structure and multiple transformed words aiming at improving the accuracy and efficiency of tendency classification. We take emotional lexicons of HowNet as the basis, propose a new discovery algorithm of new mieroblogging words, which is used to construct a mieroblogging emotional lexicon. After the text is performed the sentence segmentation, word segmentation, POS tagging and sentiment process, we set up an automata to calculate the sentiment tendency of the short text. In order to objectively evaluate this method, we chose HowNet-Based classification and SVM-based classification to make comparison experiment. Experimental results show that the proposed method has equivalent effect with SVM classification method on general text, and outperforms on the short text noticeably. The proposed method also has the outstanding advantages in efficiency.
出处
《计算机应用与软件》
CSCD
北大核心
2012年第10期89-93,共5页
Computer Applications and Software
基金
国家自然科学基金项目(61170112)