基于领域类别信息C-value的多词串自动抽取被引量：7

Exploiting Domain Interdependence for Multi-Word Terms Extraction

下载PDF

导出

摘要该本的多词串抽取是自然语言处理领域一项重要的研究内容。该文提出了一种多类别C-value(Multi-Class C-value)方法,利用多词串在不同领域的分布信息改善领域相关的多词串抽取的性能。在汽车、科技和旅行三个领域的数据上进行实验,评价多词串的准确率,在top-100级别上,较传统的C-value方法在三个领域中分别提高了12、12和13个百分点。实验结果验证了方法的有效性。 Automatic multi-word terms extraction attracts more and more attention in the research of natural language processing. This paper proposes a Multi-Class C-value method, which uses the distribution of multi-word terms in different domains, to improve the performance of multi-word terms extraction. In the experiment with the data of automobile, technology and trip, the precisions of top 100 multi-word terms are 12%, 12% and 13% higher than the clssical C-value method in three domains respectively.

作者李超王会珍朱慕华张俐朱靖波

机构地区东北大学自然语言处理实验室

出处《中文信息学报》 CSCD 北大核心 2010年第1期94-98,共5页 Journal of Chinese Information Processing

基金国家自然科学基金资助项目(60873091) 辽宁省自然科学基金资助项目(20072032) 沈阳市科学技术计划资助项目(1081235-1-00)

关键词计算机应用中文信息处理多词串抽取多类别C-value 领域信息 computer application Chinese information processing multi-word terms extractionl Multi-Class C- value domain information

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献9

1Sophia Ananiadou. Towards a Methodology for Automatic Term Recognition[D]. University of Manchester Institute of Science and Technology, 1988.
2Sophia Ananiadou. A methodology for automatic term recognition[C]//Proceedings of the 15th International Conference on Computational Linguistics. Morristown, NJ, USA: Association for Computational Linguistics, 1994 : 1034-1038.
3Didier Bourigault. Surface grammatical analysis for the extraction of terminological noun phrases[C]//Proceedings of the 14th International Conference on Computational Lingustics. Morristown, NJ, USA: Association for Computational Linguistics, 1902 : 977-081.
4Ido Dagan, Ken Church. Termight: Identifying and translating technical terminology [ C]//Proceedings of the 7th Conference of the European Chapter of the Association for Computational Linguistics. Morristown, NJ, USA: Association for Computational Linguistics, 1994, 34-40.
5Beatrice Daille, Eric Gaussier, Jean-Marc Lange. Towards automatic extraction of monolingual and bilingual terminology[C]//Proceedings of the 15th International Conference on Computational Linguistics. Morristown, NJ, USA: Association for Computational Linguistics, 1994:515-521.
6John S. Justeson,Slava M. Katz. Technical terminology: some linguistic properties and an algorithm for identication in text[J]. Natural Language Engineering, 1(1):9-27, 1995.
7Chantal Enguehard,Laurent Pantera. Automatic natural acquisition of a terminology[J]. Journal of Quantitative Linguistics, 1994,2 (1) : 27-32.
8KT Frantzi, S Ananiadou. The C-Value/NCValue domain independent method for multi-word term extraction [J].Journal of Natural Language Processing, 1999,6(3): 145-179.
9朱靖波,陈文亮.基于领域知识的文本分类[J].东北大学学报（自然科学版）,2005,26(8):733-735. 被引量：12

二级参考文献11

1Boykin S, Merlino A. Machine learning of event segmentation for news on demand[J]. Communications of the ACM, 2000,43(2):35-41.
2Luhn H P. A statistical approach to mechanized encoding and searching of literary information[J]. IBM Journal, 1957,10(1):309-317.
3Edmundson H. New methods in automatic extracting[J]. Journal of the ACM, 1969,16(2):264-285.
4Salton G, James A, Buckley C. Automatic analysis, theme generation, and summarization of machine-readable texts[J]. Science, 1994,264(3):1421-1426.
5Lehnert W, Loiselle C. An introduction to plot unit[A]. Semantic Structures-Advances in Natural Language Processing[C]. Hillsdale: Lawrence Erlbaum Associates, 1989.88-111.
6Hearst A. Context and structure in automated full-text information access[D]. Berkeley:University of California, 1994.103-105.
7Peter W F. Latent semantic analysis for text-based research, behavior research methods[J]. Instruments and Computers, 1996,28(2):197-202.
8Fabrizio S. Machine learning in automated text categorization[J]. ACM Computing Surveys, 2002,34(1):1-47.
9Sangkon L, Masami S. Passage segmentation based on topic matter[J]. Computer Processing of Oriental Languages, 2002,15(3):305-340.
10Lin C Y. Robust automated topic identification[D]. Los Angeles: University of Southern California, 1997.56-61.

共引文献11

1王琦.自动分类技术研究[J].河南财政税务高等专科学校学报,2008,22(4):91-93. 被引量：1
2毕静.自动分类技术研究[J].电脑知识与技术,2009,5(2):1020-1021. 被引量：2
3吴波.网络环境下文本自动分类方法研究综述[J].鸡西大学学报（综合版）,2009,9(5):151-152.
4杨丽华,袁方,姚增利,王煜.基于启发式规则的Deep Web接口发现[J].河北大学学报（自然科学版）,2010,30(1):107-112. 被引量：1
5贺欢,李文强,李彦,胡连军.支持产品创新的机电领域知识库构建技术研究[J].组合机床与自动化加工技术,2014(8):37-39. 被引量：2
6唐守利,徐宝祥.基于本体的云服务语义检索系统研究[J].现代图书情报技术,2014(12):27-35. 被引量：3
7张玲玲,周全亮,唐广文,李兴森,石勇.基于领域知识和聚类的关联规则深层知识发现研究[J].中国管理科学,2015,23(2):154-161. 被引量：18
8沈加.关于高校教学资源的自动分类研究[J].福建电脑,2015,31(5):101-102.
9杨莹,王庆文.面向制造领域文本的多标签分类方法[J].制造业自动化,2016,38(2):10-13. 被引量：2
10陈健鹏.基于图计算与知识匹配的事件分拨模型[J].电脑知识与技术,2023,19(20):13-16.

同被引文献98

1周强.汉语句法树库标注体系[J].中文信息学报,2004,18(4):1-8. 被引量：90
2张锋,许云,侯艳,樊孝忠.基于互信息的中文术语抽取系统[J].计算机应用研究,2005,22(5):72-73. 被引量：36
3黄德才,戚华春.PageRank算法研究[J].计算机工程,2006,32(4):145-146. 被引量：69
4何燕,穗志方,段慧明,俞士汶.一种结合术语部件库的术语提取方法[J].计算机工程与应用,2006,42(33):4-7. 被引量：17
5徐中一,胡谦,刘磊.基于CRF的中文组块分析[J].吉林大学学报（理学版）,2007,45(3):416-420. 被引量：7
6L. R. Rabiner (1989) A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition[C]//Proceedings of IEEE. 77(2):257-286.
7Satoshi S. , Nagao M. Toward memory-based translation[C]//Proceedings of the 13th International Confer ence on Computational Linguistics (COLING-90). Hel sinki, Finland, 1990: 247-252.
8吕学强.面向机器翻译的E-Chunk获取与应用研究[D].博士毕业论文.东北大学.2005:27-52.
9Nagao M. , Mori S. A new method of n-gram statistics for large number of n and automatic extraction of words and phrases from large text data of Japanese [C]//Proceedings from the 15th International Conference on Computational Linguistics, Kyoto 1994 : 611-615.
10刘群,李素建.基于知网的词汇语义相似度计算[C]//第三届汉语词汇语义学研讨会.台北,2002.

引证文献7

1李渝勤,孙丽华.面向互联网舆情的热词分析技术[J].中文信息学报,2011,25(1):48-53. 被引量：17
2胡阿沛,张静,刘俊丽.基于改进C-value方法的中文术语抽取[J].现代图书情报技术,2013(2):24-29. 被引量：23
3孙茂松,李莉,刘知远.面向中英平行专利的双语术语自动抽取[J].清华大学学报（自然科学版）,2014,54(10):1339-1343. 被引量：8
4周绍钧,吕学强,李卓,都云程.基于多策略融合的专利术语自动抽取[J].计算机应用与软件,2015,32(2):28-32. 被引量：4
5俞琰,赵乃瑄.基于通用词与术语部件的专利术语抽取[J].情报学报,2018,37(7):742-752. 被引量：14
6于清,常乐,徐健,刘天毅,LI Xiao-long.基于汉维医疗平行语料的双语术语抽取研究[J].内蒙古大学学报（自然科学版）,2018,49(5):528-533. 被引量：5
7俞琰,赵乃瑄.融入术语知识的专利主题发现方法[J].图书情报工作,2018,62(21):118-126. 被引量：3

二级引证文献66

1吕璐成,罗文馨,许景龙,王莉莉,马丽婧,赵亚娟.专利情报方法、工具、应用研究进展及新技术应用趋势[J].情报学进展,2020(1):235-278. 被引量：9
2陈祥,洪福金,张贤坤.基于案例推理的网络舆情辅助决策系统研究[J].计算机与现代化,2012(6):13-16. 被引量：8
3郭冲.基于新闻标题的网络热词发现算法[J].计算机与现代化,2013(3):58-62.
4熊李艳,谭龙,钟茂生.基于有效词频的改进C-value自动术语抽取方法[J].现代图书情报技术,2013(9):54-59. 被引量：11
5胡阿沛,张静,张晓宇.共词网络分析中E指数的改进研究[J].情报理论与实践,2014,37(1):46-50. 被引量：7
6刘俊丽,张秀梅,蒋勇青.基于文本挖掘的乙型肝炎相关文献知识图谱分析[J].医学信息学杂志,2014,35(1):48-53. 被引量：11
7张雷瀚,吕学强,李卓,徐丽萍.领域本体术语的抽取方法研究[J].情报学报,2014,33(2):167-174. 被引量：8
8王昊.基于百度指数的网络热词关注度分析——以互动百科2010-2012年年度十大热词为例[J].新闻传播,2014,0(5):141-144. 被引量：13
9张杰,张海超,翟东升.面向中文专利权利要求书的分词方法研究[J].现代图书情报技术,2014(9):91-98. 被引量：9
10闫琪琪,张海军.中文领域术语自动抽取方法进展研究[J].电脑知识与技术,2014(10):6716-6718. 被引量：6

1林磊,孙承杰,张二艳,刘秉权.一种基于改进似然比的术语自动抽取方法[J].广西师范大学学报（自然科学版）,2010,28(1):153-156. 被引量：1
2熊李艳,谭龙,钟茂生.基于有效词频的改进C-value自动术语抽取方法[J].现代图书情报技术,2013(9):54-59. 被引量：11
3何婷婷,张勇.基于质子串分解的中文术语自动抽取[J].计算机工程,2006,32(23):188-190. 被引量：21
4程斌,张水茂.基于统计与规则的术语抽取[J].科技广场,2009(9):26-28. 被引量：2
5涂建军,何汉林.基于语义分析的降维特征提取[J].情报学报,2014,33(9):952-958. 被引量：4
6周霜霜,徐金安,陈钰枫,张玉洁.融合规则与统计的微博新词发现方法[J].计算机应用,2017,37(4):1044-1050. 被引量：15
7许德山,张智雄,王峰,邢美凤.上下文分析与统计特征相结合的英文术语抽取研究[J].现代图书情报技术,2010(12):28-33. 被引量：1
8梁颖红,张文静,张有承.C值和互信息相结合的术语抽取[J].计算机应用与软件,2010,27(4):108-110. 被引量：7
9麻雪云,肖诗斌,王弘蔚,施水才.基于关键名词短语聚类的中文搜索结果聚类[J].计算机工程与应用,2009,45(31):118-121. 被引量：1
10胡阿沛,张静,刘俊丽.基于改进C-value方法的中文术语抽取[J].现代图书情报技术,2013(2):24-29. 被引量：23

中文信息学报

2010年第1期

浏览历史

内容加载中请稍等...

基于领域类别信息C-value的多词串自动抽取被引量：7

参考文献9

二级参考文献11

共引文献11

同被引文献98

引证文献7

二级引证文献66

相关作者

相关机构

相关主题

浏览历史

基于领域类别信息C-value的多词串自动抽取 被引量：7

参考文献9

二级参考文献11

共引文献11

同被引文献98

引证文献7

二级引证文献66

相关作者

相关机构

相关主题

浏览历史

基于领域类别信息C-value的多词串自动抽取被引量：7