摘要
随着微博的快速发展,微博检索已经成为近年来研究领域的热点之一。该文首先以TREC Microblog数据为基础,从分析微博文档和微博查询两方面出发,得出微博检索与传统文本检索之间的两点不同:一是微博文档相较于网页具有很多独有的特征;二是微博查询属于时间敏感查询,即在排序时除了考虑文本的语义相似度,还需要考虑时间因素,将这类方法统称为时间感知的检索技术。这两点差异使得已有的信息检索技术不能满足微博搜索的需求。该文主要介绍了近年来这两方面的相关研究:首先描述了微博本身的多种特征以及基于这些特征提出的检索方法;然后以传统信息检索过程为主线,分别介绍了将时间信息用于文本表示、文档先验、查询扩展三方面的排序模型,最后总结了已有工作并且对未来研究内容进行了展望。
With the rapid recent years. Firstly, in dataset. We found that, development of microblog, microblog retrieval has this paper, we analyze microblog documents and become one of the hot research areas in queries based on the TREC Microblog in contrast to traditional text retrieval, microblog search significantly differs in two ways One is that microblog has its own characteristics compared to webpage. And the other is that microblog queries are time-sensitive, which means time information should be used in addition to traditional text similarity. According to these two differences, traditional text retrieval methods cannot he directly used in microhlog search. Then, the related work on the two aspects of microblog retrieval is summarized. We described some microblog features and re- trieval methods based on these features. According to the process of information retrieval, search models which use temporal information as the document priori or for query expansion or for text representation are also introduced. At last, we provide the conclusion and discuss the future work.
出处
《中文信息学报》
CSCD
北大核心
2015年第2期10-23,共14页
Journal of Chinese Information Processing
基金
科技支撑计划(2012BAH46B02)
关键词
微博检索
时间信息
微博特性
文本表示
文档先验
查询扩展
microblog search
temporal information
microblog feature
text representation
document priori
query expansion