遗传算法在主题Web信息采集中的应用研究被引量：5

Research of a Focused Crawler Using Genetic Algorithm

下载PDF

导出

摘要传统的基于本地搜索算法的信息采集系统存在诸如主题漂移和采集结果局部最优等问题。在深入研究Web拓扑结构基础上,利用网络蜘蛛的在线状态,提出了基于全局信息的、动态综合了链接的立即回报价值和未来回报价值的遗传算法。通过此算法,利用元搜索技术可进一步提高网络蜘蛛的性能,具有更高的查全率和查准率,能够较好地解决现存问题。 Traditional focused crawler uses local search algorithms. It causes the problems of ‘topic drift＇ and ‘partially most superior＇. Based on the knowledge of Web structure and web crawler＇s online status and meta-search technology, we proposed a new global search algorithm-genetic algorithm, which synthesizes the linkage＇ s immediate value and future value dynamically. Our experiments show that the new algorithm has better recall rate and precision.

作者唐志王成良

机构地区重庆大学计算机学院重庆大学软件学院

出处《计算机科学》 CSCD 北大核心 2006年第7期71-74,共4页 Computer Science

关键词网络蜘蛛遗传算法 WEB社区信息采集 Genetic algorithm, Web spider, Web community, Information retrieve

分类号 TP393 [自动化与计算机技术—计算机应用技术] TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献20

1Menezer F,Pant G, Ruiz M, et al. Evaluating Topic-Driven Web Crawlers [A]. In:Proceedings of 24th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval [C], 2001. 241-249
2Ester M, Grob M, Kriegel H. Focused Web crawling: a generic framework for specifying the user interest and for adaptive crawling strategies[A]. In: Proceedings of 26th International Conference on Very Large Database(VLDB'01)[C], 2001. 527-534
3Eichmann D. Ethical Web Agents. In.. Proceedings of the 2nd International World Wide Web Conference, Chicago, Illinois, USA,1994
4Cho J. Crawling the Web.. Discovery and maintenance of largescale Web data [D]. Department of Computer Science, Stanford University, 2001
5Hersoviei M, Heydon A, Mitzenmaeher M, et al. The sharksearch algorithm -An application: Tailored Web site mapping[A]. In:Proceedings of 7th International World Wide Web Conference [C], 1998. 317-326
6Borodin A,Roberts G O,Rosenthal J S,et al. Finding Authorities and Hubs From Link Struetures on the World Wide Web [A]. In:Proceedings of 10th International world Wide Web Conference,ACM, 2001. 415-419
7Cho J,Gareia-Molina H,Page L. Efficient crawling through URL ordering [J]. Computer Networks, 198,30(1-7) : 161-172
8Rennie J, McCallum A. Using reinforcement learning to spiderthe Web efficiently [A]. In: Proceedings of the International Conference on Machine Learning(ICML 99)[C], 1999. 335-343
9McCallum A, Nigam K, Rennie J, et al. Building Domain-Specific Search Engines with Machine Learning Techniques [A]. AAAI-99 Spring Symposium on Intelligent Agents in Cyberspace [C],1999
10Gibson D, Kleinberg J, Raghavan P. Inferring Web Communities from Link Topology. In: Proc. of the 9th ACM Conference on Hypertext and Hypermedia, Pittsburgh, Pennsylvania, USA, 1998

同被引文献59

1李学勇,田立军,谭义红,欧阳柳波,李国徽.一种基于非贪婪策略的网络蜘蛛搜索算法[J].计算技术与自动化,2004,23(2):35-39. 被引量：6
2印鉴,陈忆群,张钢.搜索引擎技术研究与发展[J].计算机工程,2005,31(14):54-56. 被引量：53
3周立柱,林玲.聚焦爬虫技术研究综述[J].计算机应用,2005,25(9):1965-1969. 被引量：155
4林锐浩,陈晓龙.基于种群多样性指导的遗传算法[J].计算机工程与设计,2005,26(11):3100-3102. 被引量：11
5曹红兵.新一代搜索引擎UJIK0[J].图书馆建设,2007(2):48-49. 被引量：2
6林海霞,原福永,陈金森,刘俊峰.一种改进的主题网络蜘蛛搜索算法[J].计算机工程与应用,2007,43(10):174-176. 被引量：18
7Hersovici M, Heydon A, Mitzemacher M. The Shark search Algorithm An Application:Tailored Web Site Mapping[C]. World Wide Web Conference, Toronto, Canada, 1998.
8Srinivasan P, Pant G, Menczer F, et al. Target See-king Crawlers and Their Topical Performance[C]. SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland,2002.
9Sutton R S, Barto A G. Reinforcement Learning: an Introduction[M]. MA: MIT Press, 1998.
10Li Yuanxiang, Zou Xiufen. Solving Global Optimal Problems by Using a Dynamical Evolutionary Algorithm[C]. The 5th International Conference on Algorithms and Architectures for Parallel Processing, Beijing, China, 2002.

引证文献5

1杜欢.主题Web信息采集技术[J].四川理工学院学报（自然科学版）,2007,20(5):10-13. 被引量：1
2童亚拉,李元香,沈显君.基于动力粒子群算法的网络蜘蛛搜索策略研究[J].计算机应用研究,2008,25(5):1374-1377.
3童亚拉.自适应动态演化粒子群算法在Web主题信息搜索中的应用[J].武汉大学学报（信息科学版）,2008,33(12):1296-1299. 被引量：4
4张晶,肖智斌,容会,崔毅.改进型遗传算法在网络蜘蛛上的应用[J].山东大学学报（理学版）,2015,50(5):1-6. 被引量：3
5冯思度,杨健叶,韩煦.基于医疗信息的网络爬虫系统的研究与设计[J].现代信息科技,2019,3(10):23-25. 被引量：2

二级引证文献10

1何坚石.数字出版环境下的信息资源采集研究现状与展望[J].江西图书馆学刊,2010,40(3):19-22. 被引量：5
2刘丽杰,李盼池,张强.基于量子行为进化算法的聚焦爬虫搜索策略[J].计算机应用研究,2012,29(11):4280-4283. 被引量：2
3于娟,刘强.主题网络爬虫研究综述[J].计算机工程与科学,2015,37(2):231-237. 被引量：103
4朱朝艳,刘露旭,唐永鑫,刑婕思.改进遗传算法在框架结构优化设计中的应用[J].辽宁工业大学学报（自然科学版）,2016,36(3):168-170. 被引量：8
5杨海军,施敏,梁汝峰,蔡立志.基于用户行为模型的移动APP信息采集方法[J].计算机应用与软件,2018,35(6):158-162. 被引量：4
6冯思度,杨健叶,韩煦.基于医疗信息的网络爬虫系统的研究与设计[J].现代信息科技,2019,3(10):23-25. 被引量：2
7郑志勇.海量网络教育资源挖掘研究与实现[J].科技资讯,2019,17(26):4-5.
8徐昊,沈江明.面向网站群的主题爬虫研究[J].软件导刊,2020,19(8):109-112. 被引量：3
9董富江,张文学.分布式主题舆情采集与分析系统设计[J].软件导刊,2020,19(11):116-119.
10Jingfa LIU,Fan LI,Ruoyao DING,Zi’ang LIU.Focused crawling strategies based on ontologies and simulated annealing methods for rainstorm disaster domain knowledge[J].Frontiers of Information Technology & Electronic Engineering,2022,23(8):1189-1204.

1王芬.元搜索引擎研究[J].广东农工商职业技术学院学报,2008,24(2):84-86.
2令狐大智,李陶深.动态综合信息安全模型及其使用注意事项[J].广西科学院学报,2005,21(S1):100-102.
3沈宇,黄卫东.基于领域本体的元搜索技术研究[J].信息通信,2008,21(2):17-20. 被引量：2
4陈丽萍.面向网络的语义Web社区提取算法[J].赤峰学院学报（自然科学版）,2013,29(5):15-17.
5陈小锐.互联网传播视听节目监管系统的设计与实践[J].广播与电视技术,2012,39(8):138-140. 被引量：1
6张素兰,杨炳儒,范艳梅.一种基于图结构挖掘WEB用户访问模式的方法[J].计算机工程与应用,2004,40(12):37-39. 被引量：3
7王明鹤,唐耀阳,季晨龙.基于NI数据采集卡的内燃机动态综合测试系统的研究[J].内燃机与配件,2011(10):11-14. 被引量：1
8李书宁.CARE联盟等机构发布开放标准联邦检索软件OpenTranslators[J].现代图书情报技术,2008(3):104-104.
9王晖,彭智勇,李蓉蓉,徐波,翟卫祥.Web数据管理研究进展[J].小型微型计算机系统,2011,32(1):1-8. 被引量：2
10苏超,蔡铭,姚玉荣.面向领域资源的智能元搜索技术研究[J].计算机科学,2006,33(9):107-109. 被引量：3

计算机科学

2006年第7期

浏览历史

内容加载中请稍等...

遗传算法在主题Web信息采集中的应用研究被引量：5

参考文献20

同被引文献59

引证文献5

二级引证文献10

相关作者

相关机构

相关主题

浏览历史

遗传算法在主题Web信息采集中的应用研究 被引量：5

参考文献20

同被引文献59

引证文献5

二级引证文献10

相关作者

相关机构

相关主题

浏览历史

遗传算法在主题Web信息采集中的应用研究被引量：5