摘要
对已有的科技项目查重方法进行系统性的综述,为其他研究者快速了解相关的背景和方法提供有用的知识和线索。首先给出科技项目查重的定义及其实现的一般过程,然后从文本预处理、特征提取、模型构建和相似度判别等维度对常用的方法进行分析和总结,讨论其优点和不足,最后阐述科技项目查重方法的未来发展趋势。
Identification of highly similar scientific projects is an essential way of ensuring fairness of project approval. In recent years, it has been one of the hottest topics in science and technology management. This paper reviews identification methods of highly similar scientific projects in a systemic way, which provides effective knowledge and clues for other re- searchers to quickly understand relevant background and methods. Firstly, a concept of identification of highly similar sci- entific projects and its general realization process are described, then, we summarize common methods for text pre - pro- cessing, feature extraction, model construction and similarity discrimination, including their advantages and disadvantages. Finally, future development trends are discussed for identification methods of highly similar scientific projects.
作者
李善青
邢晓昭
杜圣梅
Li Shanqing, .Xing Xiaozhao, Du Shengmei(Institute of Scientific and Technical Information of China, Beijing 100038, Chin)
出处
《科技管理研究》
CSSCI
北大核心
2018年第6期197-201,共5页
Science and Technology Management Research
基金
国家自然科学基金项目"大数据挖掘在科技项目查重中的应用研究"(71303223)
关键词
科技项目查重
文本预处理
特征提取
模型构建
相似度判别
identification of highly similar scientific projects
text pre -processing
feature extraction
model construction
similarity discrimination