摘要
[目的 /意义]引文内容分析能够帮助揭示文献引用关系的深层语义内涵,而引文上下文识别作为引文内容分析的基础显得尤为重要。[方法 /过程]梳理已有引文上下文研究的现状,总结当前引文上下文识别的不足,在此基础上归纳引文上下文识别的5类特征,并采用文本分类和序列标注两种方法开展引文上下文自动识别实验。[结果 /结论]实验结果表明,本文提出的特征能够很好地提升引文上下文识别效果,且基于文本分类的SVM分类效果要优于基于序列标注的CRF。
[ Purpose/significance] Citation content analysis can help to reveal the deep semantic influence of litera- ture citation relations, and citation context identification as a basis for content analysis is particularly important. [ Meth- od/process] This paper reviews the latest development of researches of citation context and summarizes the deficiencies in citation context identification. Based on which five categories of citation context identification features are proposed. Be- sides, this paper also conducts an automatic identification experiment by utilizing text classification and sequence labeling. [ Result/conclusion] A significant improvement over baseline method shows the effectiveness of our features. Besides, the text classification based SVM method performs better than the sequence labeling based CRF method.
出处
《图书情报工作》
CSSCI
北大核心
2016年第17期78-87,共10页
Library and Information Service
基金
国家自然科学基金面上项目"面向词汇功能的学术文本语义识别与知识图谱构建"(项目编号:71473183)研究成果之一
关键词
引文上下文
引文内容分析
支持向量机
条件随机场
隐式上下文
citation context citation analysis support vector machine condition random field no-explicit context