期刊文献+

基于增量主题模型的微博在线事件分析 被引量:5

Microblog Online Event Analysis Based on Incremental Topic Model
下载PDF
导出
摘要 为更好地利用微博结构化社会网络方面的信息,提出一种基于增量主题模型的微博在线事件分析算法。通过设计增量过程,保留已有的训练信息,采用自适应非对称学习算法融入新微博内容与用户关系。实验结果表明,该算法可在短暂的时间内建模,并有效提高事件分析的性能。 Aiming at the existing event analysis algorithms do not make full use of the structure information on social network of microblogs, this paper proposes a microblog online event analysis algorithm based on incremental topic model. This algorithm designs a reasonable incremental process to preserve the existing training information, and gives an adaptive asymmetric learning mechanism to integrate the content and user relationship of new microblogs. Experimental results show that this algorithm leads to more balanced and comprehensive improvement for online event detection in near real-time scenarios.
作者 马慧芳 王博
出处 《计算机工程》 CAS CSCD 2013年第3期191-196,共6页 Computer Engineering
基金 国家自然科学基金资助项目(61163039) 西北师范大学青年教师科研能力提升计划骨干基金资助项目(NWNU-LKQN-10-1)
关键词 用户关系 话题检测与追踪 主题模型 自适应 增量概率 增量算法 user relationship Topic Detection and Tracking(TDT) topic model adaptive incremental probability incremental algorithm
  • 相关文献

参考文献12

  • 1Petrovi S, Osborne M, Lavrenko V. Streaming First Story Detection with Application to Twitter[C]//Proc. of the Annual Conference of the North American Chapter of the Association for Computational Linguistics. Los Angeles, USA: IEEE Press, 2010.
  • 2Mathioudakis M, Koudas N. TwitterMonitor: Trend Detection over the Twitter Stream[C]//Proc. of Inter- national Conference on Management of Data. New York, USA: ACM Press, 2010.
  • 3Marcus A, Bernstein M S, Badar O, et al. Twitinfo: Aggregating and Visualizing Microblogs for Event Exploration[C]//Proc. of the 29th Annual Conference on Human Factors in Computing Systems. New York, USA: ACM Press, 2011.
  • 4Becker H, Naaman M, Gravano L. Beyond Trending Topics: Real-world Event Identification on Twitter[C]// Proc. of the 5th International AAAI Conference on Weblogs and Social Media. Menlo Park, USA: AAAI Press, 2011.
  • 5张晨逸,孙建伶,丁轶群.基于MB-LDA模型的微博主题挖掘[J].计算机研究与发展,2011,48(10):1795-1802. 被引量:166
  • 6Kireyev K, Palen L, Anderson K. Applications of Topics Models to Analysis of Disaster-related Twitter Data[C]// Proc. of the 24th Annual Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2009.
  • 7Cohn D, Hofmann T. The Missing Link A Probabi- listic Model of Document Content and Hypertext Connectivity[C]//Proc. of Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2001.
  • 8David C C, Chang Huan. Learning to Probabilistically Identify Authoritative Documents[C]//Proc. of the 7th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann Publishers, 2000.
  • 9Ma Huifang, Li Zhixin, Shi Zhongzhi. Combining the Missing Link: An Incremental Topic Model of Document Content and Hyperlink[C]//Proc. of the 12th International Asia-Pacific Web Conference. Beijing, China: [s. n.], 2010.
  • 10Yang Yiming, Pierce T, Carbonell J. A Study on Re- trospective and Online Event Detection[C]//Proc. of the 9th International ACM SIGIR Conference on Information Retrieval. Melbourne, Australia: ACM Press, 1998.

二级参考文献20

  • 1Kang J H, Lerman K, Plangprasopchok A. Analyzing Microblogs with affinity propagation [C] //Proc of the 1st KDD Workshop on Social Media Analytic. New York: ACM, 2010:67-70.
  • 2Ramage D, Dumais S, Liebling D. Characterizing microblogs with topic models [C] //Proc of Int AAAI Conf on Weblogs and Social Media. Menlo Park, CA: AAAI, 2010:130-137.
  • 3Xu R, Wunsch D. Survey of clustering algorithms [J]. IEEE Trans on Neural Networks, 2005, 16(3): 645-678.
  • 4Deerwester S, Dumais S, Landauer T, et al. Indexing by latent semantic analysis [J]. Journal of the American Society of Information Science, 1990, 41(6): 391-407.
  • 5Landauer T K, Foltz P W, Laham D. Introduction to Latent Semantic Analysis [J]. Discourse Processes, 1998, 25 (2) 259-284.
  • 6Griffiths T, Steyvers M. Probabilistic topic models [G] // Latent Semantic Analysis: A Road to Meaning. Hillsdale, NJ: Laurence Erlbaum, 2006.
  • 7Hofmann T. Probabilistic latent semantic indexing [C] // Proc of the 22nd Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 1999:50-57.
  • 8Salton G, McGill M. Introduction to Modern Information Retrieval [M]. New York: McGraw-Hill, 1983.
  • 9Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 10Wei X, Croft W B. LDA-based document models for ad hoc retrieval [C] //Proc of the 29th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York:ACM, 2006:178-185.

共引文献165

同被引文献46

  • 1李胜东,吕学强,魏震,施水才.基于两层阈值的话题/报道表示模型[J].华中科技大学学报(自然科学版),2013,41(S2):117-120. 被引量:1
  • 2骆卫华,于满泉,许洪波,王斌,程学旗.基于多策略优化的分治多层聚类算法的话题发现研究[J].中文信息学报,2006,20(1):29-36. 被引量:38
  • 3谭松波,王月粉..中文文本分类语料库-TanCorpV1.0..http://lcc.ict.ac.cn/-tansongbo/corpus1.php,,[2005-12-20]..
  • 4中科院计算所.基于多层隐马模型的汉语词法分析系统ICTCLAS. http://www.nlp.org.cn/project/project.php?proj_id=6.
  • 5洪宇,张宇,刘挺,李生.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):71-87. 被引量:153
  • 6Nist.The 2004 Topic Detection and Tracking (TDT2004) Task Definition and Evaluation Plan.http://www.itl.nist.gov/iad/mig/tests/tdt/2004/TDT04.Eval.Plan.v1.2.pdf.
  • 7Li Xinwu.Research on Text Clustering Algorithm Based on K_ means and SOM[C] //Proceedings of ShangHai:International Symposium on Intelligent Information Technology Application Workshops,2008:341-344.
  • 8Tan S B,et al.A Novel Refinement Approach for Text Categorization[C] //Proceedings of ACM CIKM,2005.
  • 9Tim Leek,Richard Schwartz,Srinivasa Sista.Probabilistic Approaches to Topic Detection and Tracking[J] .Data Mining and Knowledge Discovery.2003,7(3):67-83.
  • 10Yiming Yang,Jaime Carbonell,Ralf Brown,et al.Multi-Strategy Learning for Topic Detection and Tracking:a joint report of CMU approaches to multilingual TDT[C] //Proceedings of TDT 2002 Workshop.2002:85-114.

引证文献5

二级引证文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部