摘要
提出一种融合语义的隐马尔科夫模型用于文本分类的方法。将特征词的语义作为先验知识融合到隐马尔科夫分类模型中。通过信息增益提取特征词,用word2vec提取特征词语义,将每一个类别映射成一个隐马尔科夫分类模型,模型中状态转移过程就是该类文本生成过程。将待分文本与分类模型做相似度比较,取得最大类别输出概率。该方法不仅考虑特征词、词频、文档数量先验知识,而且将特征词语义融合到隐马尔科夫分类模型中。通过实验评估,取得了比原HMM模型和朴素贝叶斯分类模型更好的分类效果。
In this paper, a text classification method based on Hidden Markov Model and semantic fusion is proposed. The semantics of the feature words are integrated into the hidden Markov model as a priori knowledge. Then, the characteristic words were extracted by information gain, and the feature words semantics were extracted by the word2vec. Each class was mapped into a hidden Markov model, and the state transition process in the model was the text generation process. The similarity between the text to be classified and the classification model was compared to obtain the maximum class output probability. This method not only considers the prior knowledge of feature words, word frequency and document quantity, but also integrates the semantic of feature words into hidden Markov classification model. Through the experimental evaluation, we got better classification result than the original HMM model and Naive Bayes classification model.
出处
《计算机应用与软件》
2017年第7期303-307,共5页
Computer Applications and Software