摘要
在线新事件检测的主要任务是从以时间顺序到来的新闻报道中识别出未知事件。提出一种基于新闻要素的自动在线新事件检测方法。首先,构建基于新闻要素的报道和事件表示模型,该模型包括新闻报道地点、人物和内容等要素,使用多维要素的优越性在于可以区别相似事件;为计算各要素对应特征的相似度提供对应的相似度算法:使用基于地理本体树的地名相似度算法计算地点相似度,使用基于维基百科的语义相似度计算方法计算报道内容之间的相似度;为了衡量各要素的重要性,使用SVM模型训练得出各要素的权值;最后,以single-pass聚类算法为基础,在算法过程中不断修改事件的表示向量以防止事件中心的漂移,同时使用滑动的时间窗口以减少因处理大量不活跃事件引起的时间消耗。实验结果表明该方法可以有效地降低系统的漏检率和误检率,提高事件检测的性能。
The main task of online new event detection (ONED) is to distinguish unknown events from chronological news reports. We propose an automatic ONED method which is based on the news elements. First, the method builds a news elements-based representation model for events and reports, the mbdel includes the elements of news report including place, people and content, the use of multi-dimension- al elements has the advantage in being able to differentiate similar events; it provides corresponding similarity algorithms for calculating the similarity of each element' s corresponding feature : geographical ontology-based toponym similarity algorithm is used to calculate the place similarity, and Wikipedia-based semantic similarity algorithm is used to calculate the similarity between the contents of report; in order to balance the importance of each element, the weight of each element is derived from the training which uses SVM model; Finally, taking the single-pass clustering algorithm as the basis, the event representation vector is modified constantly in the process of the algorithm to prevent the drift of event centre. Meanwhile the slipped time window is used to decrease the time cost caused l^y dealing with a lot of inactive events. Experimental results show that the algorithm can effectively reduce the miss probability and false-alarm probability of the system, improves the performance of the event detection.
出处
《计算机应用与软件》
CSCD
北大核心
2013年第12期100-104,176,共6页
Computer Applications and Software
基金
国家科技支撑项目(2009BAH46B03)