期刊文献+

中文网页语义标注:由句子到RDF表示 被引量:31

Semantic Annotation of Chinese Web Pages:From Sentences to RDF Representations
下载PDF
导出
摘要 语义网远景的实现需要自动化的语义标注方法.提出了一种在领域本体指导下,针对中文网页的语义标注方法.运用统计学方法与自然语言处理技术,以文档中句子为处理对象,采取识别和组合两个阶段来完成句子向RDF表示的映射.它具有以下特点:以统计方法获得领域相关词汇,构造领域词汇标注列表作为外部领域知识,降低对通用语言本体的依赖;显式的属性类型标注方法识别出句子中表达关系的词汇,标注为属性类型,利于后续关系抽取;构造句子的句法依存关系树(森林),按照依存关系对词汇进行组合,形成RDF陈述.实验结果显示此方法较基于主谓宾语法关系的语义标注方法更为有效. The Semantic Web aims to leverage the World Wide Web to a Web of data,where machines are able to process annotations and relations between resources,and where implicit information can be derived from utilizing ontologies and shared vocabularies.To fulfill the vision of the Semantic Web,a method of automatic semantic annotation is needed.Proposed in this paper is a methodology for semantic annotation of Chinese Web pages,which is guided by domain ontology.The statistical method and the natural language processing technology are employed,and the mapping from sentences to RDF representations are realized through the identification phase and the grouping phase.The major technical contributions are:the domain lexicon constructed by the statistical method rather than the linguistic ontology is used as the external domain knowledge;the explicit property type tagging algorithm is used to recognize both instances and properties contained in sentences to facilitate relation extraction;after building dependency trees or dependency forests of sentences,the identified instances and properties can be grouped into RDF statements according to the dependency relationship among Chinese words.The experimental result shows that compared with the semantic annotation method based on the grammatical relationship of subject-verb-object,this method is significantly more effective.
出处 《计算机研究与发展》 EI CSCD 北大核心 2008年第7期1221-1231,共11页 Journal of Computer Research and Development
基金 国家自然科学基金重大项目(60496321) 吉林省科技发展计划基金项目(20070533)
关键词 自然语言处理 依存关系 类型标注 关系抽取 本体 natural language processing dependency relationship type tagging relation extraction ontology
  • 相关文献

参考文献22

  • 1Berners-Lee T, Hendler J, Lassila O. The Semantic Web [J]. Scientific American Magazine, 2001, 284(5):28-37
  • 2Ciravegna F, Wilks Y. Designing adaptive information extraction for the Semantic Web in amilcare [G]//Handschuh S, Staab S, eds. Annotation for the Semantic Web, Frontiers in Artificial Intelligence and Applications. Amsterdam: IOS Press, 2003:112-127
  • 3Handschuh S, Staab S, Ciravegna F. S-CREAM: Semiautomatic Creation of metadata [C] //Proc of EKAW 2002. Berlin: Springer, 2002:358-372
  • 4Handschuh S, Staab S, Maedche A. CREAM: Creating relational metadata with a component-based, omology-driven annotation framework [G] //Proc of the 1st Int'l Conf on Knowledge Capture. New York: ACM, 2001:76-83
  • 5Dill S, Tomlin J, et al. SemTag and seeker: Bootstrapping the Semantic Web via automated semantic annotation [C]// Proc of the 12th Int'l Conf on World Wide Web. New York: ACM, 2003: 178-186
  • 6Kiryakov A, Popov B, Terziev I, et al. Semantic annotation, indexing, and retrieval [J]. Journal of Web Semantics, 2004, 2(1) : 49-79
  • 7Alani H, Kim S, Millard D, etal. Automatic ontology-based knowledge extraction from Web documents [J]. Intelligent Systems, 2003, 18(1): 14-21
  • 8Lai Y, Wang R. Towards automatic knowledge acquisition from text based on ontology-centric knowledge representation and acquisition [C]//Proc of the K-CAP 2003 Workshop on Knowledge Markup and Semantic Annotation (Semannot'2003). New York: ACM, 2003
  • 9Schutz A, Buitelaar P. RelExt : A tool for relation extraction from text in ontology extension [C] //Proc of the 4th Int'l Semantic Web Conf (ISWC). Berlin: Springer, 2005: 593- 606
  • 10Miller G, Beckwith R, Fellbaum C, et al. Introduction to WordNet: An on-line lexical database [J]. International Journal of Lexicography, 1990, 3(4) : 235-244

二级参考文献55

  • 1刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量:198
  • 2邹纲,刘洋,刘群,孟遥,于浩,西野文人,亢世勇.面向Internet的中文新词语检测[J].中文信息学报,2004,18(6):1-9. 被引量:59
  • 3孙茂松,黄昌宁,高海燕,方捷.中文姓名的自动辨识[J].中文信息学报,1995,9(2):16-27. 被引量:87
  • 4罗智勇,宋柔.现代汉语自动分词中专名的一体化、快速识别方法[C]//Ji Dong-Hong.国际中文电脑学术会议,新加坡,2001:323-328.
  • 5罗智勇 宋柔.现代汉语自动分词中专名的一体化、快速识别方法[A]..ICCC,Singapore[C].,2001.11..
  • 6Ji Heng, Luo Zhen-Shen. Inverse name frequency model and rules based on Chinese name identifying. In: Huang ChangNing, Zhang Pu ed.. Natural Language Understanding and Machine Translation. Beijing: Tsinghua University Press,2001, 123 - 128( in Chinese)(季姮,罗振声.基于反比概率模型和规则的中文姓名自动辨识系统.见:黄昌宁,张普编.自然语言理解与机器翻译.北京:清华大学出版社,2001,123-128)
  • 7Zhen Jia-Heng, Liu Kai-Ying. Discussion on strategy of surname and personal name processing in Chinese word segmentation. In: Chen Li-Wei ed.. Research and Application of Computational Linguistics. Beijing: Beijing Institute of Linguistics and Culture Press, 1993(in Chinese)(郑家恒刘开瑛.自动分词系统中姓氏人名的处理策略探讨.见:陈力为编.计算语言研究与应用.北京:北京语言学院出版社,1993)
  • 8Song Rou, Zhu Hong et al.. Approach of personal name recognition based on corpus and rules. In: Chen Li Wei ed.. Research and Application of Computational Linguistics. Beijing:Beijing Institute of Linguistics and Culture Press, 1993(in Chinese)(宋柔,朱宏等.基于语料库和规则库的人名识别法.见:陈力为编.计算语言研究与应用.北京:北京语言学院出版社,1993)
  • 9Wang Sheng, Huang De-Gen, Yang Yuan-Sheng. Chinese person name recognition based on mixture of statistics and rules.In: Huang Chang-Ning, Dong Zhen-Dong ed.. Corpora of Computational Linguistics. Beijing: Tsinghua University Press, 1999 (in Chinese)(王省,黄德根,杨元生.基于统计和规则相结合的中文姓名识别.见:黄昌宁,董振东编.计算语言学文集.北京:清华大学出版社,1999)
  • 10Chen Xiao-He. Automatic Analysis of Modern Chinese. Beijing: Beijing University Linguistics and Culture Press, 2000,104-114(in Chinese)(陈小荷.现代汉语自动分析.北京:北京语言文化大学出版社, 2000, 104-114 )

共引文献394

同被引文献461

引证文献31

二级引证文献213

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部