摘要
以作者和单位发文的统计需求为背景,分析作者和单位发文在跨库检索中产生数据冗余的特殊成因,在借鉴网页去重的基础上,设计中文跨库ID、英文跨库ID、DOI以及"标题+类型"4种文献跨库去重方法,解决中文库之间、英文库之间以及中英文库之间的冗余问题,并有效应用于专家发文和单位发文信息获取与统计工作中。
This paper takes the statistic on publications by authors and affiliations as the background. Special reasons that cause data redundancy in cross - database searching are analyzed, and four duplicate removal methods including Cross Chinese Database ID, Cross English Database ID, DOI and "Title & Type" are proposed and applied in literature statistics work effectively, which can better solve the cross -database redundancy problems between different databases.
出处
《现代图书情报技术》
CSSCI
北大核心
2011年第7期116-120,共5页
New Technology of Library and Information Service
关键词
跨库检索
去重策略
文献信息
Cross- database searching Duplicate removal strategy Literature information