期刊文献+

基于文本聚类与分布式Lucene的知识检索 被引量:10

Knowledge retrieval based on text clustering and distributed Lucene
下载PDF
导出
摘要 针对传统集中式索引处理大规模数据的性能和效率问题,提出了一种基于文本聚类的检索算法。利用文本聚类算法改进现有的索引划分方案,根据查询与聚类结果的距离计算判断查询意图,缩减查询范围。实验结果表明,所提方案能够有效地缓解大规模数据建索引和检索的压力,大幅提高分布式检索性能,同时保持着较高的准确率和查全率。 To solve the low performance and efficiency issues of the traditional centralized index when processing largescale unstructured knowledge, the authors proposed the retrieval algorithm based on text clustering. The algorithm used text clustering algorithm to improve the existing index distribution method, and reduced the search range by judging the query intent through the distance of query and clusters. The experimental results show that the proposed scheme can effectively alleviate the pressure of indexing and retrieval in handling large-scale data. It greatly improves the performance of distributed retrieval, and it still maintains relatively high accuracy rate and recall rate.
出处 《计算机应用》 CSCD 北大核心 2013年第1期186-188,共3页 journal of Computer Applications
关键词 非结构化知识 分布式索引 文本聚类 全文检索 并行检索 unstructured knowledge distributed index text clustering full-text search parallel retrieval
  • 相关文献

参考文献11

  • 1蒋明原,孔令德,宁静静.一种海量数据下的Lucene全文检索解决方案[J].电脑开发与应用,2011,24(4):32-35. 被引量:1
  • 2MOFFAT A, WEBBER W, ZOBEL J. Load balancing for term-dis- tributed parallel retrieval [ C]// SIGIR'06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and De- velopment in Information Retrieval. New York: ACM Press, 2006: 348 - 355.
  • 3曹宇,尹刚,李翔,程荣斌,王怀民.聚类搜索引擎研究进展浅析[J].电脑知识与技术,2011,7(8):5398-5400. 被引量:2
  • 4徐文海,温有奎.一种基于TFIDF方法的中文关键词抽取算法[J].情报理论与实践,2008,31(2):298-302. 被引量:65
  • 5OWEN S, ANIL R, DUNNING T, et al. Mahout in action [ M]. Greenwich: Manning Publications, 2010:123 - 137.
  • 6ESTEVES R M, PAIS R, RONG C. K-means clustering in the cloud--a Mahout test [ C]// Proceedings of the 2011 IEEE Work- shops of International Conference on Advanced Information Networ- king and Applications. Washington, DC: IEEE Computer Society, 2011:514 -519.
  • 7ESTEVES R M, RONG C. Using Mahout for clustering Wikipedia's latest articles: a comparison between K-means and fuzzy C-means in the cloud [ C]// Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and Science. Washing- ton, DC: IEEE Computer Society, 2011:565-569.
  • 8BUTLER M H, RUTHERFORD J. Distributed Lucene: a distribu- ted free text index for Hadoop [ EB/OL]. [ 2012-03-25]. http:/! www. hpl. hp. com/techreports/2008/HPL-2008-64, pdf.
  • 9SAJJA K. Performance study of Lucene in parallel and distributed environments [ D]. Boise: Boise State University, 2011.
  • 10HATCHER E, GOSPODNETIC O, McCANDLESS M. Lueene in action [ M]. Greenwich: Manning Publications, 2009.

二级参考文献47

共引文献76

同被引文献115

引证文献10

二级引证文献54

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部