摘要
【目的】针对中文在线评论产品特征与观点抽取问题,提出一种基于置信度排序模型的抽取方法。【方法】在改进HITS算法基础上,综合考虑候选特征观点词的关联关系和语义关系构建置信度排序模型,提取并过滤特征观点词。【结果】和基准模型相比,本文方法对中文语料的产品特征和观点抽取能达到较高准确率和召回率。【局限】仅针对产品显性特征抽取,没有考虑隐性特征的识别与抽取。【结论】利用特征词和观点词的双向增强关系和语义关系,可以有效抽取产品特征观点;情感极性过滤对提升观点词抽取准确率有较大作用。
[Objective] This study proposed a confidence ranking model to extract product feature and user opinion from the Chinese online reviews. [Methods] Examining the semantic and association relations between candidate words, we built the confidence ranking model based on the improved HITS algorithm, and then retrieved the feature and opinion words. [Results] Compared with the reference model, our method showed better recall and precision rates while extracting the feature and opinion words from the Chinese corpus. [Limitations] Only extracted the explicit feature and opinion words, and did not try to identify and extract the implicit ones. [Conclusions] We could effectively extract the feature and opinion words using their mutual reinforcement and semantic relations. Filtering method of the semantic polarity could also improve the precision of the extracted opinion words.
出处
《现代图书情报技术》
CSSCI
2016年第2期16-24,共9页
New Technology of Library and Information Service
基金
国家自然科学基金项目"中文语境下基于模糊本体的用户在线评论的情感分析"(项目编号:70971099)和国家自然科学基金项目"在线评论对商家业绩的影响研究:情感分析的视角"(项目编号:71371144)的研究成果之一
关键词
置信度排序
HITS
关联关系
语义关系
双向增强关系
特征观点抽取
Confidence ranking
HITS
Association relation
Semantic relation
Mutual reinforcement
Feature opinion extraction