摘要
互联网的迅速发展使信息检索的环境发生了重大变化。而基于互联网的搜索引擎的排序算法直接关系到用户在新的环境里进行信息检索的使用体验。文中提出一种将PageRank算法、分类技术、文档TF-IDF(词频-逆向词频)值相结合的方法,对排序算法进行改进。该算法对于用户查询的关键字进行预分类,判断用户的输入关键字最可能属于的文本类型。基于此优先从Solr库中取出类别相似的数据,使得主题相关的文本靠前显示。实验结果表明,该排序算法具有较快的查询响应时间和较高的查准率。
The rapid development of the Intemet makes information retrieval environment has undergone major changes. The ranking algo- rithm based on Intemet march engine directly influences the user experience in a new environment for information retrieval. In this paper, an improved sorting algorithm was proposed in which the PageRank algorithm, classification techniques, documentation TF-IDF (Term Frequency-Inverse Term Frequency) values were combined to improve the sorting algorithm. The keywords of the user' s queries are pre -classified to predict which class are the user' s text input keywords most likely belong to. Similar data are taken from Solr library based on this, making the front display text relevant to the subject. Experiments results show that the sorting algorithm has faster query response time and high precision.
出处
《计算机技术与发展》
2015年第7期49-53,共5页
Computer Technology and Development
基金
国家自然科学基金资助项目(61373139)
江苏省自然科学基金(BK2012833)
江苏省高校自然科学基金(12KJB520011)
南京邮电大学科研基金(NY213160)