摘要
在外文期刊数据库中,同一姓名简称代表多位作者的现象十分普遍,严重影响作者检索的精度。本次研究将规则与算法相结合,依据规则为分类算法标注训练数据,从而在无监督条件下使用有监督算法,实现作者的精确检索。该算法适用于论文查证等已知作者身份的姓名消歧问题,相比通用的消歧方法,该方法结合无监督算法无需人工标注的优点,以及有监督算法高效率、易对应实体的优点。实践结果表明,该方法具有较高的准确度。
In foreign periodicals databases, a prevalent problem is to use the same abbreviation for names of several authors. It seriously affects the accuracy of the author search. This paper attempts to, by utilizing rules and algorithms, enable accurate search by author names: it annotates training data for classification algorithm based on rules, so that supervised algorithm can be conducted in unsupervised conditions. The algorithm is suitable for author name disambiguation of the known authors. Compared with regular disambiguation methods, this method, because of the unsupervised algorithm, does not require manual annotation, and thus features higher efficiency and is easier to correspond with entity. The method is proved to result in higher accuracy in practice.
作者
范午攸
Fan Wuyou(Shanghai Jiao Tong University Library)
出处
《图书馆杂志》
CSSCI
北大核心
2018年第12期56-63,共8页
Library Journal
关键词
作者姓名消歧
数据标注
分类算法
朴素贝叶斯
Author name disambiguation
Data annotation
Classification algorithm
Naive Bayes