摘要
[目的/意义]软件在现代科学研究中发挥着重要作用,高效识别学术文献中的软件实体对深入认识软件的学术价值、促进软件可持续发展和学术生态体系均衡发展具有重要意义。[方法/过程]本研究首先对软件实体进行定义;然后基于小型知识库的程序辅助标注方案,构建软件实体识别领域语料库;在此基础上,提出改进的SciBERT-BiLSTM-CRF-wordMixup模型并对该模型的识别效果进行评估。[结果/结论]实验结果显示,本研究提出的改进模型SciBERT-BiLSTM-CRF-wordMixup在软件实体识别任务中表现最优,其整体F1值达到87.5%,说明该模型能够有效地从学术论文文本中识别出软件及其相关信息实体。
[Purpose/Significance]Software plays an important role in modern scientific research,and efficiently identifying software entities in academic literature is of great significance for deeply recognizing the academic value of software,promoting the sustainable development of software and the balanced development of academic ecosystem.[Method/Process]This study first defined software entities,then constructed a software entity recognition domain corpus based on a program-assisted annotation scheme for small knowledge bases.On the basis of which,this study proposed an improved SciBERT-BiLSTM-CRF-wordMixup model and evaluated the recognition effect of the model.[Result/Conclusion]The experimental results show that the improved model SciBERT-BiLSTM-CRF-wordMixup proposed in this study performs best in the software entity recognition task,with an overall F1 value of 87.5%,indicating that the model is able to efficiently recognize software and its related information entities from the text of academic papers.
作者
潘雪莲
钱雨菲
王宪雨
Xuelian;Qian Yufei;Wang Xianyu(School of Information Management,NanJing University,Nanjing 210023,China)
出处
《现代情报》
CSSCI
北大核心
2024年第10期75-85,共11页
Journal of Modern Information
基金
教育部人文社会科学研究青年基金项目“知识重组视角下学术流动对科研人员知识生产的影响机理研究”(项目编号:22YJC870011)
国家自然科学基金青年项目“基于全文本数据的软件实体抽取与学术影响力研究”(项目编号:71704077)。