期刊文献+

基于网络表示学习的作者重名消歧研究 被引量:10

Author Name Disambiguation with Network Embedding
原文传递
导出
摘要 【目的】消除文献系统中的作者重名歧义,以解决其导致的文献错误聚合问题。【方法】通过结构化文献数据建立作者网络、文献网络以及作者-文献网络,融合不同网络表示学习方法获得文献节点表示,并采用无监督学习方法,将文献节点表示作为特征,使用层次凝聚聚类按照真实作者对文献进行正确划分。【结果】在ArnetMiner、CiteSeerX和DBLP三组数据集上进行实证研究,本文方法在网络稀疏的情况下仍然具有较好的效果,Macro-F1值在次优模型基础上最高提升6%。【局限】仅研究英文情境下的作者重名消歧。【结论】基于网络表示学习的方法能够有效解决作者重名消歧问题,实验结果对于改进科研合作推荐、引文推荐以及知识网络相关研究具有重要意义。 [Objective]The paper tries to eliminate the ambiguity of author names in the document system,aiming to solve the problem of incorrect document aggregation.[Methods]First,we constructed three types of networks for authors,documents and author-documents,with structured document data.Then we combined different network embedding methods to obtain the representation of document nodes.Finally,we employed the unsupervised learning model and the hierarchical agglomerative clustering to process the documents.[Results]We conducted empirical studies on datasets from ArnetMiner,CiteSeerX and DBLP.Our method performed well on sparse networks and the macro-F1 value increased by 6%.[Limitations]We only explored author name disambiguation in English.[Conclusions]The proposed method could effectively reduce the ambiguity of author names.It is of great significance for scientific collaboration and citation recommendation,as well as knowledge network related research.
作者 余传明 钟韵辞 林奥琛 安璐 Yu Chuanming;Zhong Yunci;Lin Aochen;An Lu(School of Information and Safety Engineering,Zhongnan University of Economics and Law,Wuhan 430073,China;School of Information Management,Wuhan University,Wuhan 430072,China)
出处 《数据分析与知识发现》 CSSCI CSCD 北大核心 2020年第2期48-59,共12页 Data Analysis and Knowledge Discovery
基金 教育部人文社会科学研究一般项目“多语言情境下基于深度表示与对齐的观点摘要研究”(项目编号:19YJC870029) 国家自然科学基金面上项目“大数据环境下基于领域知识获取与对齐的观点检索研究”(项目编号:71373286)的研究成果之一.
关键词 网络表示学习 异构网络 作者重名消歧 无监督学习 Network Embedding Heterogeneous Network Author Name Disambiguation Unsupervised Learning
  • 相关文献

参考文献8

二级参考文献86

  • 1曹犟,邬晓钧,夏云庆,郑方.基于拼音索引的中文模糊匹配算法[J].清华大学学报(自然科学版),2009(S1):1328-1332. 被引量:14
  • 2曹雷.面向专利战略的专利信息分析研究[J].科技管理研究,2005,25(3):97-100. 被引量:60
  • 3ICTCLAS-分词-中文分词-汉语分词[EB/OL].[2009-07-18].http://ictclas.org/.
  • 4Malin B, Airoldi E, Carley K M. A Network Analysis Model for Disambiguation of Names in Lists[ J]. Computational & Mathematical Organization Theory, 2005,11 (2) :119 - 139.
  • 5WePS - 3 Workshop Program[EB/OL]. [2010 - 07 - 10]. http ://nlp. uned. es/weps/.
  • 6SemEval 2007 [ EB/OL ]. [ 2010 - 07 - 10]. http ://nip. cs. swarthrnore, edu/semeval/index, php.
  • 7Mann G S, Yarowsky D. Unsupervised Personal Name Disambiguation[ C ]. In : Proceedings of the 7th Conference on Natural Language Learning at HLT - NAACL. 2003 : 33 -40.
  • 8Balog K, Azzopardi L, Rijke M D. UVA: Language Modeling Techniques for Web People Search [ C ]. In : Proceedings of the 4th International Workshop on Semantic Evaluations. 2007:468 471.
  • 9Ono S, Sato I, Yoshida M,et al. Person Name Disambiguation in Web Pages Using Social Network, Compound Words and Latent Topics [ C ]. In : Proceedings of the 12th Pacific - Asia Conference on Advances in Knowledge Discovery and Data Mining. 2008:260 - 271.
  • 10Task3 Chinese Version[ EB/OL]. [2010 - 10 - 16]. http:// www. cipsc, org. cn/clp2010/task3_ch, htm.

共引文献147

同被引文献57

引证文献10

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部