基于新闻要素的在线新事件检测被引量：2

ONLINE NEW EVENT DETECTION BASED ON NEWS ELEMENTS

下载PDF

导出

摘要在线新事件检测的主要任务是从以时间顺序到来的新闻报道中识别出未知事件。提出一种基于新闻要素的自动在线新事件检测方法。首先,构建基于新闻要素的报道和事件表示模型,该模型包括新闻报道地点、人物和内容等要素,使用多维要素的优越性在于可以区别相似事件;为计算各要素对应特征的相似度提供对应的相似度算法:使用基于地理本体树的地名相似度算法计算地点相似度,使用基于维基百科的语义相似度计算方法计算报道内容之间的相似度;为了衡量各要素的重要性,使用SVM模型训练得出各要素的权值;最后,以single-pass聚类算法为基础,在算法过程中不断修改事件的表示向量以防止事件中心的漂移,同时使用滑动的时间窗口以减少因处理大量不活跃事件引起的时间消耗。实验结果表明该方法可以有效地降低系统的漏检率和误检率,提高事件检测的性能。 The main task of online new event detection （ONED） is to distinguish unknown events from chronological news reports. We propose an automatic ONED method which is based on the news elements. First, the method builds a news elements-based representation model for events and reports, the mbdel includes the elements of news report including place, people and content, the use of multi-dimension- al elements has the advantage in being able to differentiate similar events; it provides corresponding similarity algorithms for calculating the similarity of each element＇ s corresponding feature ： geographical ontology-based toponym similarity algorithm is used to calculate the place similarity, and Wikipedia-based semantic similarity algorithm is used to calculate the similarity between the contents of report; in order to balance the importance of each element, the weight of each element is derived from the training which uses SVM model; Finally, taking the single-pass clustering algorithm as the basis, the event representation vector is modified constantly in the process of the algorithm to prevent the drift of event centre. Meanwhile the slipped time window is used to decrease the time cost caused l^y dealing with a lot of inactive events. Experimental results show that the algorithm can effectively reduce the miss probability and false-alarm probability of the system, improves the performance of the event detection.

作者李营那阮彤顾春华

机构地区华东理工大学计算机科学与工程系

出处《计算机应用与软件》 CSCD 北大核心 2013年第12期100-104,176,共6页 Computer Applications and Software

基金国家科技支撑项目(2009BAH46B03)

关键词新事件检测 Single-pass 地理本体语义相似 New event detection Single-pass Geographical ontology Semantic similarity

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献12

1洪宇,张宇,刘挺,李生.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):71-87. 被引量：153
2Allan J, Carbonell J, Doddington G,et al. Topic detection and tracking pilot study:Final report [ C ]//Proceedings of the DARPA BroadcastNews Tran- scription and Understanding Workshop,February 1998:194 - 218.
3Allan J,Papka R,Lavrenko V. On-line New Event Detection and Tracking [ C ]//The proceedings of SIGIR 98,University of Massachusetts Amherst, 1998:37 - 45.
4Yang Y, Pierce T, Carbonell J. A study on Retrospective and On-Line Event detection [ C ]//Proceedings of the 21 st annual international ACM SIGIR conference on Research and Development in Information Re- trieval, 1988:28 - 36.
5Seo Y, Sycara K. Text clustering for topic detection. Pittsburgh: Robot- ics Institute, Carnegie Mellon University,2004 : 1 - 11.
6Yang Y,Zhang J, Carboneli J. Topic-conditioned novehy detection [ C ]// Hand Detal. Proceedings of the 8th ACM SIGKDD International Confer- ence on Knowledge Discovery and Data Mining,New York:ACM Press, 2(102:688 - 693.
7Kumaran G, Allan J. Text classification and named entities for new e- vent detection[ C]//Proceedings of the SIGIR Conference on Research and Development in Information Retrieval, Sheffield, South Yorkshire : ACM ,2004 :297 - 304.
8Brants T, Chen F, Farahat A. A system for new event detection [ C ]// Proceedings of the 26th SIGIR Conference on Research and Develop- ment in Information Retrieval,2003.
9Juha M, A M, Marko S. Applying semantic classes in event detection and tracking[ C ]//Sangal R, Bendre SM. Proceedings of International Conference on Natural Language Processing (ICON). Mumbai, India. 2008:175 - 183.
10Papka R. On-line New Event Detection, Clustering and Tracking [ D ]. Department of Computer Science. UMASS, 1999.

二级参考文献68

1贾自艳,何清,张海俊,李嘉佑,史忠植.一种基于动态进化模型的事件探测和追踪算法[J].计算机研究与发展,2004,41(7):1273-1280. 被引量：58
2金珠,林鸿飞,赵晶.基于HowNet的话题跟踪及倾向性分类研究[J].情报学报,2005,24(5):555-561. 被引量：21
3骆卫华,于满泉,许洪波,王斌,程学旗.基于多策略优化的分治多层聚类算法的话题发现研究[J].中文信息学报,2006,20(1):29-36. 被引量：38
4于满泉,骆卫华,许洪波,白硕.话题识别与跟踪中的层次化话题识别技术研究[J].计算机研究与发展,2006,43(3):489-495. 被引量：49
5宋丹,王卫东,陈英.基于改进向量空间模型的话题识别与跟踪[J].计算机技术与发展,2006,16(9):62-64. 被引量：23
6赵华,赵铁军,张姝,王浩畅.基于内容分析的话题检测研究[J].哈尔滨工业大学学报,2006,38(10):1740-1743. 被引量：20
7赵华,赵铁军,于浩,张姝.面向动态演化的话题检测研究[J].高技术通讯,2006,16(12):1230-1235. 被引量：17
8骆卫华刘群程学旗孙茂松陈群秀.话题检测与跟踪技术的发展与研究[A].孙茂松,陈群秀.全国计算语言学联合学术会议(JSCL-2003)论文集[C].北京:清华大学出版社,2003.560-566.
9Tim Leek, Hubert Jin, Sreenivasa Sista, Richard Schwartz. The BBN Crosslingual Topic Detection and Tracking System[A]. In: Working Notes of the Third Topic Detection and Tracking Workshop[C]. 2000.
10Zhang Kuo, Li Juan Zi, Wu Gang. New Event Detection Based on Indexing-tree and Named Entity[A]. In: Sigir2007[C]. ACM: Amsterdam, 2007.

共引文献152

1骆梅柳.基于大数据的校园舆情热点话题跟踪研究[J].智能计算机与应用,2020(8):287-289. 被引量：1
2姜晓伟,王建民,丁贵广.基于主题模型的微博重要话题发现与排序方法[J].计算机研究与发展,2013,50(S1):179-185. 被引量：12
3刘星星,何婷婷,龚海军,陈龙.网络热点事件发现系统的设计[J].中文信息学报,2008,22(6):80-85. 被引量：30
4鲁明羽,姚晓娜,魏善岭.基于模糊聚类的网络论坛热点话题挖掘[J].大连海事大学学报,2008,34(4):52-54. 被引量：20
5任晓东,张永奎,薛晓飞.基于K-Modes聚类的自适应话题追踪技术[J].计算机工程,2009,35(9):222-224. 被引量：13
6张晓艳,王挺.话题发现与追踪技术研究[J].计算机科学与探索,2009,3(4):347-357. 被引量：21
7饶洋辉,叶良,常红旭,程洁.新话题监测研究进展[J].图书馆杂志,2009,28(7):60-63.
8程葳,龙志祎.面向互联网新闻的在线话题检测算法[J].计算机工程,2009,35(18):28-30. 被引量：8
9焦健,瞿有利.知网的话题更新与跟踪算法研究[J].北京交通大学学报,2009,33(5):132-136. 被引量：10
10龙志祎,程葳,沈俊辉.TDT中新发现话题的分类研究与实现[J].武汉理工大学学报（信息与管理工程版）,2009,31(5):762-765. 被引量：2

同被引文献34

1洪宇,张宇,刘挺,李生.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):71-87. 被引量：153
2ALLAN J, CARBONELL J G, DODDINGTON G, et al. Topic detection and tracking pilot study final report [ C ] ff Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. Virginia: Carnegie Mellon University, 1998: 194-218.
3ALSUMAIT L, BARBARA D, DOMENICONI C. On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking [ C ] // Eighth IEEE International Conference on Data Mining. Pisa: Institute of Electrical and Electronics Engineers, 2008 : 3- 12.
4CATALDI M, DI C L, SCHIFANELLA C. Emerging topic detection on twitter based on temporal and social terms evaluation [ C ] //Proceedings of the Tenth International Workshop on Multimedia Data Mining. Washington DC: Association for Computing Machinery, 2010: 4.
5SAKAKI T, OKAZAKI M, MATSUO Y. Earthquake shakes twitter user: real-time event detection by social sensors [ C ] // Proceedings of the 19th International Conference on World Wide Web. North Carolina: Association for Computing Machinery, 2010: 851-861.
6CHUA S. The role of parts-of-speech in feature selection [ C]//Proceedings of the International MultiConference of Engineers and Computer Scientists. Hong Kong: International Association of Engineers, 2008: 457-461.
7LIU Zi-tao, YU Wen-chao, DENG Ya-lan. A feature selection method for document clustering based on part-of- speech and word co-occurrence [ C ]//Proceedings of the 7th International Conference on Fuzzy Systems and Knowledge Discovery. Yantai: Yantai University, 2010: 2331-2334.
8张英菊,仲秋雁,叶鑫,裘江南,曲晓飞.CBR的应急案例通用表示与存储模式[J].计算机工程,2009,35(17):28-30. 被引量：22
9刘宗田,黄美丽,周文,仲兆满,付剑锋,单建芳,智慧来.面向事件的本体研究[J].计算机科学,2009,36(11):189-192. 被引量：100
10丁效,宋凡,秦兵,刘挺.音乐领域典型事件抽取方法研究[J].中文信息学报,2011,25(2):15-20. 被引量：24

引证文献2

1冀俊忠,贝飞,吴晨生,柴鹰,宋辰.词性对新闻和微博网络话题检测的影响[J].北京工业大学学报,2015,41(4):526-533. 被引量：2
2徐雷,潘珺.事件表示方式及其语义表示模型研究[J].情报杂志,2019,38(6):159-167. 被引量：14

二级引证文献16

1王仁祥,姚耀军.引进外资中的优惠政策探讨[J].财经研究,2000,26(5):60-64.
2陈福集,马梅兰.网络舆情事件的话题演化分析——以成都女司机为例[J].情报杂志,2016,35(5):58-64. 被引量：6
3李欣雨,袁方,刘宇,李琮.面向中文新闻话题检测的多向量文本聚类方法[J].郑州大学学报（理学版）,2016,48(2):47-52. 被引量：6
4黄秀彬,王笑一,李承桓,孙荣,曹璐.基于遗传算法的知识库语义多粒度标注方法研究[J].电子设计工程,2020,28(19):26-30. 被引量：1
5陈金菊,欧石燕,林泽斐.典型通用事件语义模型比较分析研究[J].现代情报,2021,41(2):55-64. 被引量：6
6刘锡峰,符金国,曾桢,吴静.以事件为中心的本体模型对比研究[J].图书情报导刊,2021,6(2):52-60. 被引量：2
7李跃艳,王昊,孟镇,张宝隆.基于关联数据的汉语文本语义化描述和展示[J].情报理论与实践,2021,44(6):171-179. 被引量：7
8王伟玉,史存会,俞晓明,刘悦,程学旗.一种事件粒度的抽取式话题简短表示生成方法[J].山东大学学报（理学版）,2021,56(5):66-75. 被引量：3
9郭骅,蒋勋,许瑞,侯柏屹,张健东.协同视角下的跨域突发事件应急情报组织模式[J].情报学报,2021,40(7):697-713. 被引量：22
10李江泳,仝苏红.叙事理论的公共设施体验设计研究——以株洲智慧路灯设计为例[J].包装工程,2022,43(4):357-363. 被引量：7

1蔡偃武,高大启,阮彤,蒋锐权.面向大规模数据的在线新事件检测[J].计算机工程,2014,40(10):37-42. 被引量：1
2薛晓飞,张永奎,任晓东.基于新闻要素的新事件检测方法研究[J].计算机应用,2008,28(11):2975-2977. 被引量：8
3刁洪祥.话题检测与跟踪关键技术研究[J].信息与电脑,2016,28(7):31-32. 被引量：1
4洪宇,张宇,范基礼,刘挺,李生.基于子话题分治匹配的新事件检测[J].计算机学报,2008,31(4):687-695. 被引量：26
5黄颖.LDA及主题词相关性的新事件检测[J].计算机与现代化,2012(1):6-9. 被引量：4
6王颖颖,张赟,胡乃静.在线新事件检测系统中的性能提升策略[J].计算机工程,2008,34(15):72-74. 被引量：3
7樊旭琴,张永奎.基于词对向量空间模型的新事件检测方法[J].计算机工程与应用,2010,46(12):123-125. 被引量：4
8生海迪,段会川,孔超.基于语义短语的空间金字塔词袋模型图像分类方法[J].小型微型计算机系统,2015,36(4):877-881. 被引量：8
9詹川,卢显良,周旭,侯孟书,袁连海.基于贝叶斯公式的垃圾邮件过滤方法[J].计算机科学,2005,32(2):73-75. 被引量：11
10冯进丽,杨红菊.基于BoC-BoF特征的图像检索方法研究[J].计算机科学,2015,42(4):297-301. 被引量：5

计算机应用与软件

2013年第12期

浏览历史

内容加载中请稍等...

基于新闻要素的在线新事件检测被引量：2

参考文献12

二级参考文献68

共引文献152

同被引文献34

引证文献2

二级引证文献16

相关作者

相关机构

相关主题

浏览历史

基于新闻要素的在线新事件检测 被引量：2

参考文献12

二级参考文献68

共引文献152

同被引文献34

引证文献2

二级引证文献16

相关作者

相关机构

相关主题

浏览历史

基于新闻要素的在线新事件检测被引量：2