期刊文献+

基于话题标签的微博主题挖掘 被引量:10

Microblog Topic Mining Based on Hashtag
下载PDF
导出
摘要 随着互联网的发展,微博已成为人们获取信息的主要平台,为从海量微博中挖掘出有价值的主题信息,结合微博中的会话、转发和话题标签,将微博划分为用户兴趣、用户互动和话题微博3类,提出基于作者主题模型(ATM)的话题标签主题模型HC-ATM,使用Gibbs抽样法对模型进行推导,获取微博主题结构。在Twitter数据集上的实验结果表明,与ATM模型和基于潜在狄利克雷分布的微博生成模型相比,HC-ATM模型的主题困惑度更小、差异度更大,并且能有效挖掘出不同微博类型的主题分布。 With the development of the Internet,microblog has become a major platform for people to obtain the information. In order to mine useful topic from microblog,based on the futures of microblog that having conversation tags,retw eet tags and hashtags,this paper divides microblog into three kinds. They are microblogs about users' interest,users interaction and hashtag-related. It designs a novel hashtag topic model named Hashtag Conversation Author Topic M odel( HC-ATM) based on Author Topic M odel( ATM),and uses Gibbs sampling implementation for inference of this model. Experiments on Tw itter dataset show that HC-ATM outperforms the ATM and M icro Blog Latent Dirichlet Allocation( M B-LDA) in terms of both perplexity and KL-divergence. Besides,HC-ATM can mine topic distribution of different kinds of microblog effectively.
出处 《计算机工程》 CAS CSCD 北大核心 2015年第4期30-35,共6页 Computer Engineering
基金 国家自然科学基金资助项目(61033010 61272065) 广东省自然科学基金资助项目(S2011020001182 S2012010009311) 广东省科技计划基金资助项目(2011B040200007 2012A010701013)
关键词 主题挖掘 微博 社交网络 话题标签主题模型 作者主题模型 topic mining microblog social network hashtag topic model Author Topic Model(ATM)
  • 相关文献

参考文献18

  • 1Yan Xiaohui, Guo Jiafeng, Lan Yanyan, et al. A Biterm Topic Model for Short Texts[ C ]//Proceedings of the 22nd International Conference Companion on World Wide Web. Rio de Janeiro, Brazil: IW3C2 Press, 2013 : 1445-1456.
  • 2Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. Journal of Machine Learning Research, 2003,3( 1 ) :993-1022.
  • 3张晨逸,孙建伶,丁轶群.基于MB-LDA模型的微博主题挖掘[J].计算机研究与发展,2011,48(10):1795-1802. 被引量:166
  • 4Zhao Xin,Jiang Jing, He Jing, et al. Comparing Twitter and Traditional Media Using Topic Models [ C ]// Proceedings of the 33rd European Conference on IR Research. Berlin, Germany: Springer-Verlag, 2011: 338-349.
  • 5Hong Liangjie, Dom B, Gurumurthy S, et al. A Time- dependent Topic Model for Multiple Text Streams [ C ]// Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA : ACM Press, 2011 : 832-840.
  • 6Deerwester S C, Dumais S T, Landauer T K, et al. Indexing by Latent Semantic Analysis [ J ]. Journal of American Society for Information Science, 1990,41 (6) : 391 407.
  • 7Griffiths T L, Steyvers M. Finding Scientific Topics[ J]. National Academy of Sciences of the United States of America ,2004,101 ( S1 ) :5228-5235.
  • 8Minka T P, Lafferty J. Expectation-propagation for the Generative Aspect Model [ C ]//Proceeding of the 18th Conference on Uncertainty in Artificial Intelligence. Boston, USA : AUAI Press ,2002 : 352-359.
  • 9Blei D M,Lafferty J D. Correlated Topic Models[ C]// Proceedings of NIPS ' 05. Cambridge, USA : MIT Press, 2005 : 147-155.
  • 10Steyvers M, Smyth P, Griffiths T. Probabilistic Author- topic Models for Information Discovery [ C ]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York, USA : ACM Press, 2004 : 306-315.

二级参考文献24

  • 1Kang J H, Lerman K, Plangprasopchok A. Analyzing Microblogs with affinity propagation [C] //Proc of the 1st KDD Workshop on Social Media Analytic. New York: ACM, 2010:67-70.
  • 2Ramage D, Dumais S, Liebling D. Characterizing microblogs with topic models [C] //Proc of Int AAAI Conf on Weblogs and Social Media. Menlo Park, CA: AAAI, 2010:130-137.
  • 3Xu R, Wunsch D. Survey of clustering algorithms [J]. IEEE Trans on Neural Networks, 2005, 16(3): 645-678.
  • 4Deerwester S, Dumais S, Landauer T, et al. Indexing by latent semantic analysis [J]. Journal of the American Society of Information Science, 1990, 41(6): 391-407.
  • 5Landauer T K, Foltz P W, Laham D. Introduction to Latent Semantic Analysis [J]. Discourse Processes, 1998, 25 (2) 259-284.
  • 6Griffiths T, Steyvers M. Probabilistic topic models [G] // Latent Semantic Analysis: A Road to Meaning. Hillsdale, NJ: Laurence Erlbaum, 2006.
  • 7Hofmann T. Probabilistic latent semantic indexing [C] // Proc of the 22nd Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 1999:50-57.
  • 8Salton G, McGill M. Introduction to Modern Information Retrieval [M]. New York: McGraw-Hill, 1983.
  • 9Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 10Wei X, Croft W B. LDA-based document models for ad hoc retrieval [C] //Proc of the 29th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York:ACM, 2006:178-185.

共引文献166

同被引文献77

引证文献10

二级引证文献75

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部