摘要
为弥补传统的语义标注方法在词语或句子成分之间关系描述方面的不足,该文提出了一种基于本体和依存句法的非结构化文本语义关系标注算法。算法以句子为单位,综合POS(Part of Speech)、语义辞典、语言学特征等因素对句子中词汇的语义关系进行识别,利用词语间的依存关系对词语进行语义组合,从而实现词汇语义关系标注。结合语义标注过程中的语义匹配度、语义丰富度等特征,设计了评价算法,用以衡量标注结果的正确性。实验结果表明,该标注算法能获得较高的准确率,在大规模语料下效果尤为显著。
In bridge the gap between words and syntactic components in current semantic annotation, a semantic an- notation method based on ontology and dependency syntax for unstructured text is proposed. Applied in the sentence level, this method employs the features including POS, semantic dictionary, and other linguistic features, and deter- mines the the lexical semantic relations by the dependency structure between them.. Meanwhile, an evaluation met- ric combing features like semantic similarity and semantic richness are designed, which is essentially the confidence of the method itself. Experimental results show that the semantic tagging algorithm can reach high accuracy espe- cially on large-scale corpus.
出处
《中文信息学报》
CSCD
北大核心
2015年第3期58-64,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金(60875081)
河南省科技发展计划重点攻关项目(132102210264)
关键词
语义标注
本体
非结构化文本
依存句法
semantic annotation
ontology
unstructured text
dependency syntax