期刊文献+

基于新闻要素的在线新事件检测 被引量:2

ONLINE NEW EVENT DETECTION BASED ON NEWS ELEMENTS
下载PDF
导出
摘要 在线新事件检测的主要任务是从以时间顺序到来的新闻报道中识别出未知事件。提出一种基于新闻要素的自动在线新事件检测方法。首先,构建基于新闻要素的报道和事件表示模型,该模型包括新闻报道地点、人物和内容等要素,使用多维要素的优越性在于可以区别相似事件;为计算各要素对应特征的相似度提供对应的相似度算法:使用基于地理本体树的地名相似度算法计算地点相似度,使用基于维基百科的语义相似度计算方法计算报道内容之间的相似度;为了衡量各要素的重要性,使用SVM模型训练得出各要素的权值;最后,以single-pass聚类算法为基础,在算法过程中不断修改事件的表示向量以防止事件中心的漂移,同时使用滑动的时间窗口以减少因处理大量不活跃事件引起的时间消耗。实验结果表明该方法可以有效地降低系统的漏检率和误检率,提高事件检测的性能。 The main task of online new event detection (ONED) is to distinguish unknown events from chronological news reports. We propose an automatic ONED method which is based on the news elements. First, the method builds a news elements-based representation model for events and reports, the mbdel includes the elements of news report including place, people and content, the use of multi-dimension- al elements has the advantage in being able to differentiate similar events; it provides corresponding similarity algorithms for calculating the similarity of each element' s corresponding feature : geographical ontology-based toponym similarity algorithm is used to calculate the place similarity, and Wikipedia-based semantic similarity algorithm is used to calculate the similarity between the contents of report; in order to balance the importance of each element, the weight of each element is derived from the training which uses SVM model; Finally, taking the single-pass clustering algorithm as the basis, the event representation vector is modified constantly in the process of the algorithm to prevent the drift of event centre. Meanwhile the slipped time window is used to decrease the time cost caused l^y dealing with a lot of inactive events. Experimental results show that the algorithm can effectively reduce the miss probability and false-alarm probability of the system, improves the performance of the event detection.
出处 《计算机应用与软件》 CSCD 北大核心 2013年第12期100-104,176,共6页 Computer Applications and Software
基金 国家科技支撑项目(2009BAH46B03)
关键词 新事件检测 Single-pass 地理本体 语义相似 New event detection Single-pass Geographical ontology Semantic similarity
  • 相关文献

参考文献12

  • 1洪宇,张宇,刘挺,李生.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):71-87. 被引量:153
  • 2Allan J, Carbonell J, Doddington G,et al. Topic detection and tracking pilot study:Final report [ C ]//Proceedings of the DARPA BroadcastNews Tran- scription and Understanding Workshop,February 1998:194 - 218.
  • 3Allan J,Papka R,Lavrenko V. On-line New Event Detection and Tracking [ C ]//The proceedings of SIGIR 98,University of Massachusetts Amherst, 1998:37 - 45.
  • 4Yang Y, Pierce T, Carbonell J. A study on Retrospective and On-Line Event detection [ C ]//Proceedings of the 21 st annual international ACM SIGIR conference on Research and Development in Information Re- trieval, 1988:28 - 36.
  • 5Seo Y, Sycara K. Text clustering for topic detection. Pittsburgh: Robot- ics Institute, Carnegie Mellon University,2004 : 1 - 11.
  • 6Yang Y,Zhang J, Carboneli J. Topic-conditioned novehy detection [ C ]// Hand Detal. Proceedings of the 8th ACM SIGKDD International Confer- ence on Knowledge Discovery and Data Mining,New York:ACM Press, 2(102:688 - 693.
  • 7Kumaran G, Allan J. Text classification and named entities for new e- vent detection[ C]//Proceedings of the SIGIR Conference on Research and Development in Information Retrieval, Sheffield, South Yorkshire : ACM ,2004 :297 - 304.
  • 8Brants T, Chen F, Farahat A. A system for new event detection [ C ]// Proceedings of the 26th SIGIR Conference on Research and Develop- ment in Information Retrieval,2003.
  • 9Juha M, A M, Marko S. Applying semantic classes in event detection and tracking[ C ]//Sangal R, Bendre SM. Proceedings of International Conference on Natural Language Processing (ICON). Mumbai, India. 2008:175 - 183.
  • 10Papka R. On-line New Event Detection, Clustering and Tracking [ D ]. Department of Computer Science. UMASS, 1999.

二级参考文献68

共引文献152

同被引文献34

  • 1洪宇,张宇,刘挺,李生.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):71-87. 被引量:153
  • 2ALLAN J, CARBONELL J G, DODDINGTON G, et al. Topic detection and tracking pilot study final report [ C ] ff Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. Virginia: Carnegie Mellon University, 1998: 194-218.
  • 3ALSUMAIT L, BARBARA D, DOMENICONI C. On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking [ C ] // Eighth IEEE International Conference on Data Mining. Pisa: Institute of Electrical and Electronics Engineers, 2008 : 3- 12.
  • 4CATALDI M, DI C L, SCHIFANELLA C. Emerging topic detection on twitter based on temporal and social terms evaluation [ C ] //Proceedings of the Tenth International Workshop on Multimedia Data Mining. Washington DC: Association for Computing Machinery, 2010: 4.
  • 5SAKAKI T, OKAZAKI M, MATSUO Y. Earthquake shakes twitter user: real-time event detection by social sensors [ C ] // Proceedings of the 19th International Conference on World Wide Web. North Carolina: Association for Computing Machinery, 2010: 851-861.
  • 6CHUA S. The role of parts-of-speech in feature selection [ C]//Proceedings of the International MultiConference of Engineers and Computer Scientists. Hong Kong: International Association of Engineers, 2008: 457-461.
  • 7LIU Zi-tao, YU Wen-chao, DENG Ya-lan. A feature selection method for document clustering based on part-of- speech and word co-occurrence [ C ]//Proceedings of the 7th International Conference on Fuzzy Systems and Knowledge Discovery. Yantai: Yantai University, 2010: 2331-2334.
  • 8张英菊,仲秋雁,叶鑫,裘江南,曲晓飞.CBR的应急案例通用表示与存储模式[J].计算机工程,2009,35(17):28-30. 被引量:22
  • 9刘宗田,黄美丽,周文,仲兆满,付剑锋,单建芳,智慧来.面向事件的本体研究[J].计算机科学,2009,36(11):189-192. 被引量:100
  • 10丁效,宋凡,秦兵,刘挺.音乐领域典型事件抽取方法研究[J].中文信息学报,2011,25(2):15-20. 被引量:24

引证文献2

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部