摘要
开放存取(OpenAccess,OA)期刊论文属于深层Web资源,传统的搜索引擎无法有效对其进行索引。为此,本文提出一种面向OA期刊站点的论文资源发现方法。首先,通过提取OA期刊站点首页的特征构建c4.5决策树,将OA期刊站点分为卷期目录型和检索接口型;然后,针对两类OA期刊站点分别提出基于锚文本链接分析和基于检索接口的论文资源发现算法。实验结果表明,本文提出的方法能够有效发现OA期刊论文资源,并且具有较高的准确率和查全率。
Open access (OA) journal papers belong to the deep Web resources which can not be effectively indexed by traditional search engines. Aiming at this problem, this paper proposes a paper resources discovery method for OA Journal Websites. We first build C4.5 decision tree by extracting the feature of the homepage for OA Journal Websites and classify them as two types, i.e. catalog of volumes and issues based OA Journal Websites and search interface based OA Journal Wcbsites. Then, we propose a paper resource discovery algorithm based on anchor text link analysis and a paper resource discovery algorithm based on retrieval interface according to the two types of OA Journal Websites. Experimental results show that the proposed method can effectively discovery OA journals resources and has a higher accuracy and recall.
出处
《情报学报》
CSSCI
北大核心
2013年第5期497-502,共6页
Journal of the China Society for Scientific and Technical Information
基金
基金项目:教育部科技发展中心网络时代的科技论文快速共享专项研究资助课题(2011109)
河北省自然科学基金资助项目(F2011203219).
关键词
OA期刊站点
论文资源发现c4
5决策树
期刊卷期目录
检索接口
OA journal Websites, resource discovery, C4.5 decision tree, catalog of volumes and issues, retrieval interface