摘要
【目的】通过对查询串进行扩展,实现查询串的主题分类。【方法】利用伪相关反馈技术得到查询串扩展文本抽取文本特征,并提出一种向量空间压缩算法对特征进行融合,分别利用向量余弦夹角和SVM模型对其进行分类。【结果】实验结果中正确率、召回率、F值和整体正确率分别达到90.34%、89.34%、89.67%和89.24%。【局限】根据搜索引擎返回结果进行查询扩展,在线处理效率不高。【结论】该方法对查询主题分类是有效的,并且利用机器学习方法比利用余弦夹角有更好的效果,且对于提高搜索引擎质量有重要意义。
[Objective] Expand the queries to get the query topic. [Methods] Get the query expansion text by using the pseudo-feedback technology, extract the text features and combine them by the proposed partial matching rules and vector space compression algorithm. In the end, the query topic classification can be done by the Cosine Include Angle and SVM. [Results] The precision can reach 90.34%, the recall rate is 89.34%, the F value is 89.67% and the accuracy is 89.24%. [Limitations] Online processing efficiency is not high because of expanding the queries using the searching results. [Conclusions] The proposed method is effective in query topic classification. Using the machine learning method can get the better experimental results than the Cosine Include Angle and it is significative for improving the quality of search engine.
出处
《现代图书情报技术》
CSSCI
2015年第4期10-17,共8页
New Technology of Library and Information Service
基金
国家自然科学基金项目"基于本体的专利自动标引研究"(项目编号:61271304)
国家科技支撑计划项目"基于重点目标自动跟踪采集技术的智能视频监控系统研发"(项目编号:2013BAK02B02)
北京市教委科技发展计划重点项目暨北京市自然科学基金B类重点项目"面向领域的互联网多模态信息精准搜索方法研究"(项目编号:KZ201311232037)的研究成果之一
关键词
查询串主题分类
伪相关反馈
查询扩展
向量空间压缩算法
Query topic classification
Pseudo feedback
Query expansion
Vector space compression algorithm