期刊文献+

基于领域本体的中文Web文本主题特征抽取方法 被引量:5

Topic Extracting Method of Chinese Web Documents Based on Domain Ontology
下载PDF
导出
摘要 为了快速有效地自动处理中文Web文本,提出了一种基于领域本体的主题特征抽取方法。该方法针对Web文本特点,介绍了一种领域词典的半自动化构建方法。基于领域词典切分文本,通过对词条的主题映射,采用领域本体的概念表示文本向量,从而有效地降低文本特征向量的维数,提高主题抽取的质量。考虑文本信息的不同位置与频率,计算主题特征的权值,并且基于领域本体的结构,对主题概念的权值进行调整和排序。实例验证了该方法的有效性。 In order to process Chinese Web documents rapidly, effectively and automatically, a topic extracting method based on domain ontology is proposed. Considering the characteristics of Web documents, this paper brings forward a semi - automation construction method of domain dictionary. Based on the domain dictionary, the words of the documents are firstly segmented. Then, by mapping the words to the concepts of domain ontology, the documents are represented by these concepts, thus the dimension of the feature vector is effectively reduced and the quality of topic extracting is improved. The weight of topic is computed according to different places and frequencies of document features, and modified based on the structure of domain ontology. An example proves that this method is effective.
出处 《情报理论与实践》 CSSCI 北大核心 2008年第2期286-288,285,共4页 Information Studies:Theory & Application
基金 江苏省高校自然科学基础研究项目(项目编号:KJD520151) 国防技术基础项目的研究成果之一
关键词 主题抽取 领域本体 文本挖掘 topic extracting domain ontology text mining
  • 相关文献

参考文献9

二级参考文献46

  • 1唐振民,靳从,杨静宇,李远复.一种用于自动标引系统的主题词自动切分方法[J].南京理工大学学报,1995,19(5):401-404. 被引量:2
  • 2牛凯.中文科技文献计算机自动标引系统的研究[J].情报学报,1995,14(1):16-26. 被引量:2
  • 3靳从,樊春丽,杨静宇.主题词自动标引中的知识处理方法[J].情报理论与实践,1996,19(2):30-33. 被引量:3
  • 4朱靖波.面向英汉机器翻译的统计消岐技术研究[M].沈阳:东北大学,1999..
  • 5唐振民,南京理工大学学报,1995年,19卷,5期,401页
  • 6Yang Y. An Evaluation of Statistical Approaches to Text Categorization. Journal of Information (Retrieval 1 ),1999:69-90.
  • 7Mladenic M. Feature Subset Selection in Text-learning. http://www.ai.ijs.si/DunjaMladenic.
  • 8Wulfekuhler M R,Punch W F,Finding Salient Features for Personal Web Page Categorization. In Proc.of 6th International World Wide Web Conference,1997.
  • 9Salton G,Wong A,Yang C. A Vector Space Model for Automatic Indexing. Communications of the ACM,1995,18:613-620.
  • 10Lin Shian-hua. Extracting Classification Knowledge of Intemet Documents With Mining Term Associations: a Semantic Approach. In Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval,1998:241-240.

共引文献82

同被引文献40

引证文献5

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部