期刊文献+

主题模型在基于社交媒体的灾害分类中的应用及比较 被引量:14

Application and Comparison of Topic Model in Identifying Latent Topics from Disaster-Related Tweets
原文传递
导出
摘要 “一带一路”沿线为自然灾害高发地区,且多为经济欠发达、抗灾能力弱的发展中国家。灾害发生时,挖掘和分析相关推特数据有助于开展应急救援、灾情评估、减灾防灾等工作,为中国国际救援与救助工作提供重要支撑。主题模型能在没有经验语料库的情况下,从海量灾害相关推文中快速聚合出对灾害救援、评估有价值的信息。本文采用BTM模型和LDA模型,对2013年海燕台风相关推文进行细粒度的主题聚类,分析2个模型的精度并测试它们对近似灾害主题的区分能力,并基于“需求相关”主题类的推文,通过地名匹配,分析了海燕台风发生过程中菲律宾物资、医疗等需求程度的空间分布。结果表明:①在区分主题近似的短文本时,BTM总体精度为0.598.LDA的总体精度仅为0.321,说明在海燕台风灾害推文的主题识别中,BTM模型的精度高于LDA模型;②BTM能够较好识别出“灾害地点相关”、“祈福相关”等较为精细的灾害主题;③经初步验证,基于“需求相关”主题文本生成的物资、医疗等需求的需求程度空间分布与实际需求情况基本相符。 From 1990 to 2010,the occurrence of natural disasters was increasing in countries along the "One Belt and One Road" where most countries are developing countries with underdeveloped economy and weak disaster resistance.When disasters happen,people in those countries will tweet about the disasters in real time.The tweets contain important information for emergency rescue,disaster assessment,disaster reduction and prevention,etc.Therefore,mining and analyzing relevant tweets can provide powerful support for China's international rescue and relief work.However,twitter data is fragmented and unstructured,and the number of topics that tweets contain are huge and miscellaneous.Therefore,how to rapidly screen out relevant information from tweets becomes a research challenge.Without empirical corpus,topic model can rapidly aggregate information from a large number of disaster-related tweets,which are valuable for disaster relief and assessment.In this paper,the BTM model and LDA model,that are widely used in the study of natural language processing,were adopted to cluster Haiyan typhoon-related tweets at fine granularity topics.Then we verified and compared the accuracy of two models,and tested their ability to distinguish similar disaster topics.In addition,based on the "demand-related" tweets obtained from topic categorization,through place-name matching,we analyzed the spatial distribution of demand degree of materials and medical care in the Philippines during the occurrence of Haiyan typhoon.The result shows that:(1) In classifying Haiyan typhoon-related tweets at fine granularity topics,the overall accuracy of BTM was 0.598.while that of LDA was only 0.321,indicating that BTM can outperform LDA.(2) The Fl-measure values of BTM in "disaster location-related” and "blessing-related" tweets were 0.8 and 0.78,indicating that BTM can better identify tweets of those two topics.(3) After preliminary verification,the spatial distribution of material and medical needs generated based on "demand-related" tweets was basically consistent with the actual demand.Our findings can help quickly obtain first-hand disaster information from twitter when China lacks relevant data of disasters occurring in the "One Belt and One Road" region,so to provide data support for China's international rescue work.Besides,our methodology can be used for studying domestic microblog in disasters.
作者 苏凯 程昌秀 Nikita Murzintcev 张婷 SU Kai;CHENG Changxiu;Nikita Murzintcev;ZHANG Ting(Center for Geodata and Analysis,Faculty of Geographical Science,Beijing Normal University,Beijing 100875,China;Instituteof Geographic Sciences and Natural Resources Research,Chinese Academy of Sciences,Beijing 100101,China)
出处 《地球信息科学学报》 CSCD 北大核心 2019年第8期1152-1160,共9页 Journal of Geo-information Science
基金 国家重点研发计划项目(2017YFB0504102) 中央高校基本科研业务费专项资金资助~~
关键词 主题模型 BTM LDA 推文 主题分类 自然灾害 应急管理 Topic model BTM LDA Tweet Topic categorization Natural hazard Emergency management
  • 相关文献

参考文献10

二级参考文献95

共引文献301

同被引文献182

引证文献14

二级引证文献101

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部