期刊文献+

基于领域类别信息C-value的多词串自动抽取 被引量:7

Exploiting Domain Interdependence for Multi-Word Terms Extraction
下载PDF
导出
摘要 该本的多词串抽取是自然语言处理领域一项重要的研究内容。该文提出了一种多类别C-value(Multi-Class C-value)方法,利用多词串在不同领域的分布信息改善领域相关的多词串抽取的性能。在汽车、科技和旅行三个领域的数据上进行实验,评价多词串的准确率,在top-100级别上,较传统的C-value方法在三个领域中分别提高了12、12和13个百分点。实验结果验证了方法的有效性。 Automatic multi-word terms extraction attracts more and more attention in the research of natural language processing. This paper proposes a Multi-Class C-value method, which uses the distribution of multi-word terms in different domains, to improve the performance of multi-word terms extraction. In the experiment with the data of automobile, technology and trip, the precisions of top 100 multi-word terms are 12%, 12% and 13% higher than the clssical C-value method in three domains respectively.
出处 《中文信息学报》 CSCD 北大核心 2010年第1期94-98,共5页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60873091) 辽宁省自然科学基金资助项目(20072032) 沈阳市科学技术计划资助项目(1081235-1-00)
关键词 计算机应用 中文信息处理 多词串抽取 多类别C-value 领域信息 computer application Chinese information processing multi-word terms extractionl Multi-Class C- value domain information
  • 相关文献

参考文献9

  • 1Sophia Ananiadou. Towards a Methodology for Automatic Term Recognition[D]. University of Manchester Institute of Science and Technology, 1988.
  • 2Sophia Ananiadou. A methodology for automatic term recognition[C]//Proceedings of the 15th International Conference on Computational Linguistics. Morristown, NJ, USA: Association for Computational Linguistics, 1994 : 1034-1038.
  • 3Didier Bourigault. Surface grammatical analysis for the extraction of terminological noun phrases[C]//Proceedings of the 14th International Conference on Computational Lingustics. Morristown, NJ, USA: Association for Computational Linguistics, 1902 : 977-081.
  • 4Ido Dagan, Ken Church. Termight: Identifying and translating technical terminology [ C]//Proceedings of the 7th Conference of the European Chapter of the Association for Computational Linguistics. Morristown, NJ, USA: Association for Computational Linguistics, 1994, 34-40.
  • 5Beatrice Daille, Eric Gaussier, Jean-Marc Lange. Towards automatic extraction of monolingual and bilingual terminology[C]//Proceedings of the 15th International Conference on Computational Linguistics. Morristown, NJ, USA: Association for Computational Linguistics, 1994:515-521.
  • 6John S. Justeson,Slava M. Katz. Technical terminology: some linguistic properties and an algorithm for identication in text[J]. Natural Language Engineering, 1(1):9-27, 1995.
  • 7Chantal Enguehard,Laurent Pantera. Automatic natural acquisition of a terminology[J]. Journal of Quantitative Linguistics, 1994,2 (1) : 27-32.
  • 8KT Frantzi, S Ananiadou. The C-Value/NCValue domain independent method for multi-word term extraction [J].Journal of Natural Language Processing, 1999,6(3): 145-179.
  • 9朱靖波,陈文亮.基于领域知识的文本分类[J].东北大学学报(自然科学版),2005,26(8):733-735. 被引量:12

二级参考文献11

  • 1Boykin S, Merlino A. Machine learning of event segmentation for news on demand[J]. Communications of the ACM, 2000,43(2):35-41.
  • 2Luhn H P. A statistical approach to mechanized encoding and searching of literary information[J]. IBM Journal, 1957,10(1):309-317.
  • 3Edmundson H. New methods in automatic extracting[J]. Journal of the ACM, 1969,16(2):264-285.
  • 4Salton G, James A, Buckley C. Automatic analysis, theme generation, and summarization of machine-readable texts[J]. Science, 1994,264(3):1421-1426.
  • 5Lehnert W, Loiselle C. An introduction to plot unit[A]. Semantic Structures-Advances in Natural Language Processing[C]. Hillsdale: Lawrence Erlbaum Associates, 1989.88-111.
  • 6Hearst A. Context and structure in automated full-text information access[D]. Berkeley:University of California, 1994.103-105.
  • 7Peter W F. Latent semantic analysis for text-based research, behavior research methods[J]. Instruments and Computers, 1996,28(2):197-202.
  • 8Fabrizio S. Machine learning in automated text categorization[J]. ACM Computing Surveys, 2002,34(1):1-47.
  • 9Sangkon L, Masami S. Passage segmentation based on topic matter[J]. Computer Processing of Oriental Languages, 2002,15(3):305-340.
  • 10Lin C Y. Robust automated topic identification[D]. Los Angeles: University of Southern California, 1997.56-61.

共引文献11

同被引文献98

  • 1周强.汉语句法树库标注体系[J].中文信息学报,2004,18(4):1-8. 被引量:90
  • 2张锋,许云,侯艳,樊孝忠.基于互信息的中文术语抽取系统[J].计算机应用研究,2005,22(5):72-73. 被引量:36
  • 3黄德才,戚华春.PageRank算法研究[J].计算机工程,2006,32(4):145-146. 被引量:69
  • 4何燕,穗志方,段慧明,俞士汶.一种结合术语部件库的术语提取方法[J].计算机工程与应用,2006,42(33):4-7. 被引量:17
  • 5徐中一,胡谦,刘磊.基于CRF的中文组块分析[J].吉林大学学报(理学版),2007,45(3):416-420. 被引量:7
  • 6L. R. Rabiner (1989) A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition[C]//Proceedings of IEEE. 77(2):257-286.
  • 7Satoshi S. , Nagao M. Toward memory-based translation[C]//Proceedings of the 13th International Confer ence on Computational Linguistics (COLING-90). Hel sinki, Finland, 1990: 247-252.
  • 8吕学强.面向机器翻译的E-Chunk获取与应用研究[D].博士毕业论文.东北大学.2005:27-52.
  • 9Nagao M. , Mori S. A new method of n-gram statistics for large number of n and automatic extraction of words and phrases from large text data of Japanese [C]//Proceedings from the 15th International Conference on Computational Linguistics, Kyoto 1994 : 611-615.
  • 10刘群,李素建.基于知网的词汇语义相似度计算[C]//第三届汉语词汇语义学研讨会.台北,2002.

引证文献7

二级引证文献66

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部