期刊文献+

基于查询特征分析的新闻意图自动识别 被引量:2

Automatic Identification of News Intent Based on Analyzing Query Features
原文传递
导出
摘要 从Sogou查询日志中选取样本查询且进行人工标注,通过对标注后新闻查询的分析,提出能用于识别新闻意图的新特征,即查询表达式特征、查询随时间分布特征以及点击结果特征。根据这3个特征,利用决策树分类器实现查询中新闻意图的自动识别,结果发现:1新闻类查询的查询目标主要集中在特定主题信息以及娱乐类信息方面,其查询主题大多为娱乐、政治、体育与经济类信息;2相对非新闻查询,新闻查询具有更可能包含实体、随时间分布波动较大、点击结果之间相似度更高的特点;3本方法对查询中新闻意图的识别效果较好,其宏平均准确率、召回率、F值分别为0.76、0.73、0.74。 This paper selects sample queries from Sogou query log, and makes these queries labeled by humans. Based on the analysis of the labeled news queries, we propose three novel features for news intent prediction, including query expression, a query distribution over time and clicked results. Finally, we apply the decision tree method to perform the task of automatic identification of news queries. Finally, experimental results show that : ( 1 ) Goals of news query are supposed to obtain information for a particular topic or some entertainment information, and search topics of news queries tend to be entertainment, economy, politics and sports. (2) Compared with non-news queries, new queries are likely to have named entities, larger fluctuation in the query distribution over time, and higher degree of similarity among clicked results. (3) Encouraging results of news identification are achieved, and the precision, recall, F-score for the query classification are 0.76,0.73 and 0.74, respectively.
出处 《图书情报工作》 CSSCI 北大核心 2014年第20期82-90,共9页 Library and Information Service
基金 国家自然科学基金面上项目"基于语言模型的通用实体检索建模及框架实现研究"(项目编号:71173164) 国家社会科学基金青年项目"基于情景分析的网络舆情事件应急管理动态调控机制研究"(项目编号:13CGL132)研究成果之一
关键词 查询意图 新闻查询 新闻意图 查询分类 query intent news queries news intent query classification
  • 相关文献

参考文献33

  • 1Leibowitz J. “Creative destruction” or just “destruction”,how willjournalism survive the Internet age? [EB/OL]. [ 2014 - 09 -02]. http ://ftc. gov/speeches/leibowitz/091201 newsmedia. pdf.
  • 2Diaz F. Integration of news content into Web results [ C]//Proceed-ings of the Second ACM International Conference on Web Searchand Data Mining. New York: ACM Press, 2009 : 182 - 191.
  • 3Diaz F, Arguello J. Adaptation offline vertical selection predictionsin the presence of user feedback [ C ] //Proceedings of the 32nd In-ternational ACM SIGIR Conference on Research and Developmentin Information Retrieval. New York: ACM Press,2009:323 -330.
  • 4Konig A F,Gamon M,Wu Qiang. Click -through prediction fornews queries [ C ] //Proceedings of the 32nd International ACM SI-GIR Conference on Research and Development in Information Re-trieval. New York:ACM Press,2009:347 -354.
  • 5A Louis, E Crestan, Y Billawala, et al. Use of query similarity forimproving presentation of news verticals [ C ] //Proceedings of VeryLarge Data Search. New York: ACM Press,2011.
  • 6Beitzel S M,Jensen E C,Chowdhury A,et al. Hourly analysis ofa very large topically categorized Web query log [ C ] //Proceedingsof the 27th Annual International ACM SIGIR Conference on Re-search and Development in Information Retrieval. New York: ACMPress,2004 :321 -328.
  • 7Jansen B J, Pooch U. A review of Web searching studies and aframework for future research[ J]. Journal of the American Societyfor Information Science and Technology,2001 ,52(3) :235 -246.
  • 8Gan Qingqing, Attenberg J, Markowetz A, et al. Analysis of geo-graphic queries in a search engine log [ C ]//Proceedings of theFirst International Workshop on Location and the Webpages. NewYork:ACM Press, 2008:49 -56.
  • 9Gonzalez - Caro C,Calderon - Benavides L, Baeza - Yates R. Webqueries : The tip of the iceberg of the user ’ s intent [ C ] //Proceed-ings of the 2011 the International Conference on Web Search andWeb Data Mining. New York: ACM Press, 2011 :282 -291.
  • 10Broder A. A taxonomy of Web search[ J]. SIGIR Forum,2002,36(2) : 3 -10.

二级参考文献101

  • 1Kang I,Kim G.Query type classification for web docu-ment retrieval//Proceedings of ACM SIGIR,2003.
  • 2Dai (Kathy) H H,Zhao L Z,Nie Z Q,et al.Detecting online commercial intention//Proceedings of the 15th international conference on World Wide Web,2006.
  • 3Donato D.Do You Want to Take Notes? Identifying Reasearch Missions in Yahoo! Search Pad//Proceedings of the 19th international conference on World Wide Web,2010.
  • 4Broder A.A taxonomy of web search.SIGIR Forum,2002,36(2):3-10.
  • 5Rose D E,Levinson D.Understanding user goals in web search.Proceedings of the 13th intern-ational conference on World Wide Web,2004.
  • 6Baeza-Yates R,Calderón-Benavides L,González-Caro C.The Intention behind Web Queries//Proceedings of string processing and information retrieval,2006.
  • 7Lee U,Liu Z Y,Cho J.Automatic identification of user goals in Web search.Proceedings of the 14th international conference on World Wide Web,2005.
  • 8Jansen B J,Booth D L,Spink A.Determining the infor-mational,navigational and transactional intent of Web queries.IPM,2008.
  • 9Liu Y Q,Zhang M,Ru L Y,et al.Automatic Query Type Identification Basedon click throughinformation.AIRS,2006.
  • 10Pitler E,Church K.Using word-sense disam-biguation methods to classify web queries by intent//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing.

共引文献39

同被引文献11

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部