期刊文献+

面向OA期刊站点的论文资源发现方法

Paper Resources Discovery Method for OA Journal Websites
下载PDF
导出
摘要 开放存取(OpenAccess,OA)期刊论文属于深层Web资源,传统的搜索引擎无法有效对其进行索引。为此,本文提出一种面向OA期刊站点的论文资源发现方法。首先,通过提取OA期刊站点首页的特征构建c4.5决策树,将OA期刊站点分为卷期目录型和检索接口型;然后,针对两类OA期刊站点分别提出基于锚文本链接分析和基于检索接口的论文资源发现算法。实验结果表明,本文提出的方法能够有效发现OA期刊论文资源,并且具有较高的准确率和查全率。 Open access (OA) journal papers belong to the deep Web resources which can not be effectively indexed by traditional search engines. Aiming at this problem, this paper proposes a paper resources discovery method for OA Journal Websites. We first build C4.5 decision tree by extracting the feature of the homepage for OA Journal Websites and classify them as two types, i.e. catalog of volumes and issues based OA Journal Websites and search interface based OA Journal Wcbsites. Then, we propose a paper resource discovery algorithm based on anchor text link analysis and a paper resource discovery algorithm based on retrieval interface according to the two types of OA Journal Websites. Experimental results show that the proposed method can effectively discovery OA journals resources and has a higher accuracy and recall.
出处 《情报学报》 CSSCI 北大核心 2013年第5期497-502,共6页 Journal of the China Society for Scientific and Technical Information
基金 基金项目:教育部科技发展中心网络时代的科技论文快速共享专项研究资助课题(2011109) 河北省自然科学基金资助项目(F2011203219).
关键词 OA期刊站点 论文资源发现c4 5决策树 期刊卷期目录 检索接口 OA journal Websites, resource discovery, C4.5 decision tree, catalog of volumes and issues, retrieval interface
  • 相关文献

参考文献10

  • 1杨丽华,袁方,姚增利,王煜.基于启发式规则的Deep Web接口发现[J].河北大学学报(自然科学版),2010,30(1):107-112. 被引量:1
  • 2Cope J, Craswell N, Hawking D. Automated discovery of search interfaces on the Web [ C ]//Proceedings of the 14th Australasian Database Conf. Adelaide: Australian Computer Society Press, 2003 : 181-189.
  • 3Manuel Alvarez,Juan Raposo,Alberto Pan, et al. Deep- Bot: A Focused Crawler for Accessing Hidden Web Content [ C ]//Proceedings of DEECS, 2007 : 18-25.
  • 4Raghavan S, Garcia-Molina H. Crawling the hidden Web[ C]//Proceedings of the 27th Int'l Conf. on Very Large Data Bases. Rome: ACM Press, 2001 : 129-138.
  • 5Fuzhi Zhang, Junfeng Chang, Xianshuang Zhang. A Deep Web Query Interface Automatic Identification Approach Based on SVM [ J]. ICIC Express Letters, 2011, 5 (1) : 59 -64.
  • 6王辉,刘艳威,左万利.使用分类器自动发现特定领域的深度网入口(英文)[J].软件学报,2008,19(2):246-256. 被引量:14
  • 7He H, Meng W, Yu C T, et al. Constructing Interface Schemas for Search Interfaces of Web Databases [ C ]// Proceedings of WISE, 2005:29-42.
  • 8Zhen Zhang, Bin He, Kevin Chen-Chuan Chang. Light- weight Domain-based Form Assistant: Querying Web Databases On the Fly[ C]//Proceedings of the 31st Very Large Data Bases Conference, 2005:97-108.
  • 9刘伟,孟小峰,凌妍妍.一种基于图模型的Web数据库采样方法[J].软件学报,2008,19(2):179-193. 被引量:29
  • 10徐鹏,林森.基于C4.5决策树的流量分类方法[J].软件学报,2009,20(10):2692-2704. 被引量:171

二级参考文献73

  • 1朱靖波,陈文亮.基于领域知识的文本分类[J].东北大学学报(自然科学版),2005,26(8):733-735. 被引量:12
  • 2高岭,赵朋朋,崔志明.Deep Web查询接口的自动判定[J].计算机技术与发展,2007,17(5):148-151. 被引量:13
  • 3Moore AW, Zuev D. Internet traffic classification using Bayesian analysis techniques. In: Proc. of the 2005 ACM SIGMETRICS Int'l Conf. on Measurement and Modeling of Computer Systems, Banff, 2005. 50-60. http://www.cl.cam.ac.uk/-awm22 /publications/moore2005internet.pdf.
  • 4Madhukar A, Williamson C. A longitudinal study of P2P traffic classification. In: Proc. of the 14th IEEE Int'l Syrup. on Modeling, Analysis, and Simulation. Monterey, 2006. http://ieeexplore.ieee.org/xpl/ffeeabs_all.jsp?arnumber=1698549.
  • 5Moore AW, Papagiannaki K. Toward the accurate identification of network applications. In: Dovrolis C, ed. Proc. of the PAM 2005. LNCS 3431, Heidelberg: Springer-Verlag, 2005.41-54.
  • 6Karagiannis T, Papagiannaki K, Faloutsos M. BLINC: Multilevel traffic classification in the dark. In: Proc. of the ACM SIGCOMM. Philadelphia, 2005. 229-240. http://conferences.sigcomm.org/sigcomm/2005/paper-KarPap.pdf.
  • 7Roughan M, Sen S, Spatscheck O, Dutfield N. Class-of-Service mapping for QoS: A statistical signature-based approach to IP traffic classification. In: Proc. of the ACM SIGCOMM Internet Measurement Conf. Taormina, 2004. 135-148. http://www.imconf.net/imc-2004/papers/p 135-roughan.pdf.
  • 8Zuev D, Moore AW. Traffic classification using a statistical approach. In: Dovrolis C, ed. Proc. of the PAM 2005. LNCS 3431, Heidelberg: Springer-Verlag, 2005. 321-324.
  • 9Nguyen T, Armitage G. Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks. In: Proc. of the 31 st IEEE LCN 2006. Tampa, 2006. http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4116573.
  • 10Eerman J, Mahanti A, Arlitt M. Internct traffic identification using machine learning techniques. In: Proc. of the 49th IEEE GLOBECOM. San Francisco, 2006. http://pages.cpsc.ucalgary.ca/-mahanti/papers/globecom06.pdf.

共引文献211

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部