期刊文献+

基于随机森林的Science和Nature期刊潜在精品论文识别研究 被引量:2

Identification of Potential High-Quality Articles in Two Top Journals named Science and Nature Based on Random Forest Model
原文传递
导出
摘要 【目的/意义】为推动潜在“精品”文献识别及其在科技文献识别与传播利用领域中的应用。【方法/过程】以国际顶级期刊Science和Nature期刊出版的论文及其引用分布数据为样本,统计出全部论文的首次响应时间、摘要长度,总被引频次、资金资助、论文篇幅等特征,构建“精品”论文特征矩阵;然后基于“精品”论文特征矩阵和随机森林算法进行潜在“精品”论文识别模型的训练与识别应用。【结果/结论】研究结果显示,融合“精品”论文特征矩阵和随机森林模型能够较好地识别Science和Nature期刊中的潜在“精品”论文,模型正确识别分类的准确率均值达到80%以上,其中Nature期刊的“精品”文献识别准确率高出Science期刊的“精品”论文识别准确率2%左右;使用信息增益方法的模型识别效果比使用基尼不纯度方法的识别效果略好。此外,Science和Nature期刊“精品”论文的首次被引速度极快,在出版当年即被引用。【创新/局限】“精品”文献特征矩阵和机器学习模型的结合能够较好地应用于潜在“精品”论文的识别与推荐,然而未来需将模型推广应用于海量文献中“精品”论文的识别检验。 【Purpose/significance】To promote the identification of potential"high-quality"literature and its application in the field of identification.【Method/process】This paper takes the articles from journals named Science and Nature,as well as their citation distribution data as sample.Such characteristics of each article as first-citation time,abstract length,total citation times,financial support and paper length was calculated to construct the feature matrix of"high-quality"articles.Then,based on the feature matrix of"highquality"articles and random forest algorithm,the recognition model of potential"high-quality"articles is trained and applied.【Result/conclusion】The results show that the fusion of the feature matrix of"high-quality"articles and the random forest model can efficiently identify the potential"high-quality"articles from Science and Nature,and the model’s average accuracy of correct recognition and classification is over 80%,among which the accuracy of identifying"high-quality"articles in the Nature was about 2%higher than that in the Science.The model’s effect of recognition using the information gain method is slightly better than that using the Gini impurity method.In addition,the first citation of"high-quality"articles in the Science and Nature is extremely rapid,being cited within the year of publication【Innovation/limitation】The combination of"high-quality"literature feature matrix and machine learning model can be well applied to the identification and recommendation of potential"high-quality"articles in high-impact journals.However,in the future,the model needs to be popularized and applied to the identification and inspection of"high-quality"articles in massive literature.
作者 胡泽文 任萍 周西姬 HU Ze-wen;REN Ping;ZHOU Xi-ji(School of Management Science and Engineering,Nanjing University of Information Science&Technology,Nanjing 210044,China)
出处 《情报科学》 CSSCI 北大核心 2022年第4期90-95,106,共7页 Information Science
基金 国家社会科学基金项目“面向海量科技文献的潜在‘精品’识别方法与应用研究”(20CTQ031)。
关键词 随机森林 识别模型 潜在精品 高被引 首次被引 科学计量 random forest model Identification model potential"high-quality"articles highly cited First-citation scientometrics
  • 相关文献

参考文献10

二级参考文献71

共引文献104

同被引文献29

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部