摘要
近年来,中文产品评论的特征情感分类是Web数据挖掘的重要研究内容之一。提出了一套完整的产品命名实体、特征词、情感词以及边界的标注规则,设计了多层次的混合标签模式;提出了双层HHMM(层级隐马尔科夫模型)结构,将词形标注和词性标注的特点进行融合;提出了基于词形标注的HHMM-1算法和基于词性标注的HHMM-2算法,实现复杂短语的自动标注。实验证明,双层HHMM模型起到了互补的作用,模型的查全率和F-score值均有较大提高。
In recent years,feature and opinion classification of Chinese product review is one of the most important research fields in Web data mining.A well-defined specification on data annotation for product named entities,features,opinions and boundaries was proposed and a hybrid tag representation was designed.By integrating linguistic features and POS features into automatic learning,a novel two-level Hierarchical HMMs(HHMMs) framework was put forward.The HHMM-1 and HHMM-2 algorithms were advanced to identify features and opinion entities automatically.The experimental results showed that two-level HHMM works in a mutual complementation way,which makes the recall and F-score of our approach obviously outstanding.
出处
《四川大学学报(工程科学版)》
EI
CAS
CSCD
北大核心
2013年第2期94-102,共9页
Journal of Sichuan University (Engineering Science Edition)
基金
四川省科技支撑计划资助项目(030405301054)
四川大学青年教师科研启动基金资助项目(2011SCU11012)
关键词
WEB数据挖掘
特征情感分类
标注规则
双层HHMM
Web data mining
feature and sentiment classification
tagging specification
two-level HHMM