期刊文献+

基于潜在语义分析的微博主题挖掘模型研究 被引量:31

Microblog Topic Mining Model Based on Latent Semantic Analysis
原文传递
导出
摘要 为了弥补目前微博平台主题挖掘方法的不足,兼顾到微博信息的稀疏性、多维性、海量性等特点,提出根据微博信息特点进行有针对性的预处理后,使用基于先验概率的潜在语义分析模型LDA(Latent Dirichlet Alloca-tion)进行微博主题挖掘,并在LDA建模的基础上,设计文本增量聚类算法,进一步实现主题结构的识别,从而使用户更好地理解主题及其结构。通过在真实微博数据集上的实验,证明该模型能有效进行主题挖掘和主题结构的识别。 Microblog platforms have deficiencies in topic mining method currently, and the microblog information is sparse, muhidimensional and mass. This paper proposes to apply prior probability - based LAD ( Latent Dirichlet Allocation) model on microblog topic mining after preprocessing the dataset in light of the characteristic of information. On the basis of the LDA modeling,this paper designs an incremental clustering algorithm to identify the topic structure, so that the user could better understand the topic and its structure. Through experiment in real microblog dataset, it proves that the model can mine the topic and give the topic structure efficiently and comprehensively.
出处 《图书情报工作》 CSSCI 北大核心 2012年第24期114-119,共6页 Library and Information Service
基金 国家自然科学基金资助项目"社会化媒体集成检索与语义分析方法研究"(项目编号:71273194) 教育部人文社会科学重点研究基地重大项目"面向决策的企业信息资源集成研究"(项目编号:2009JJD870002)研究成果之一
关键词 微博 短文本 主题挖掘 LDA模型 增量聚类 I microblog short text topic mining LDA model incremental clustering
  • 相关文献

参考文献10

  • 1Yang C C, Tobun D N. Analyzing and visualizing Web opinion de- velopment and social interactions with density- based clustering [ J ]. IEEE Transactions on Systems, Man, and Cybernetics, PartA . Systems and Humans, 2011,41 (6) . 1144 - 1155.
  • 2Dumais S, Fumas G,Landauer T, et al. Using latent semantic anal- ysis to imprnve access to textual information [ C]// Proceedings of Computer Human Interaction. Washington. ACM, 1988.281 - 285.
  • 3Hofmann T. Prohabilistic Latent Semantic Indexing[ C ]//Proceed- ings of the 22th Annual International S[GIR Conference on Re- search and Development in Information Retrieval. Univca, Berke- ley, CA . Assoc Computing Machinery, 1999 . 50 - 57.
  • 4Blei D M, Ng A Y,Jordan M I. Latent d irichlet allocation [ J]. Jour- nal of Machine Learning Research ,2003, 3 (4 -5 ) .993 - 1022.
  • 5Phan X, Nguyen L, Horiguchi S. Learning to classify short and sparse text &web with hidden topics frnm large - scale data collec- tions [ C]//Proceedings of 2008 WWW Conference. New York. ACM ,2008 . 91 - 100.
  • 6Titov I, McDonald R. Modeling online reviews with multi grain top- ic models [ C ]// Proceedings of 2008 WWW Conference. NewYorkzACM, 20081 Ill - 120.
  • 7郑斐然,苗夺谦,张志飞,高灿.一种中文微博新闻话题检测的方法[J].计算机科学,2012,39(1):138-141. 被引量:84
  • 8Zhang Huaping, Yu Hongkui, Xiong Deyi, et al. HHMM - based Chinese lexieal analyzer ICTCLAS [ C ]//Proceedings of the Second SIGHAN Workshop Affiliated with 41 t ACL . Sapporo. Associatian for Computational Linguistics, 2003 . 184 - 187.
  • 9余传明,张小青,陈雷.基于LDA模型的评论热点挖掘:原理与实现[J].情报理论与实践,2010,33(5):103-106. 被引量:21
  • 10Griffith T, Steyvers M. Probabilistic topic models[ G]//Latent Se- mantic AnMysis. A Road to Meaning. Hillsdale.Laurence Erlbaum Associates,2006.424 - 440.

二级参考文献26

  • 1YE Hui-min,CHENG Wei,DAI Guan-zhong.Design and Implementation of On-Line Hot Topic Discovery Model[J].Wuhan University Journal of Natural Sciences,2006,11(1):21-26. 被引量:14
  • 2ALLAN J, CARBONELL J, DODDINGTON G, et al. Topic detection and tracking pilot study : final report [ C ] // Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. Virginia: Lansdowne, 1998: 194-218.
  • 3LEEK T, SCHWARTZ R M, SISTA S. Probabilistic approaches to topic detection and tracking [ C ] //Topic Detection and Tracking: Event-based Information Organization. Kluwer Academic : Massachusetts, 2002 : 67-83.
  • 4CHEN K Y, LUESUKPRASERT L, CHOU S C T. Hot topic extraction based on timeline analysis and multidimensional sentence modeling [ J ]. IEEE Transactions on Knowledge Data Engineering, 2007 (19) : 1016-1025.
  • 5罗亚平,王枞,周延泉.基于关注度的热点话题发现模型[M]//萧国政,何炎祥,孙茂松.中文计算技术与语言问题研究.北京:电子工业出版社,2007:402-408.
  • 6OKA M, ABE H, KATO K. Extracting topics from Weblogs through frequency segments [ C ] // Proceedings of the WWW2006 Workshop on Web Intelligence, 2006: 22-26.
  • 7BLEI D M, NG A Y, JORDAN M I. Latent difichlet allocation[J]. Journal of Machine Learning Research, 2003 (3).
  • 8GRIFFITHS T L, STEYVERS M. Finding scientific topics [ C ] // Proceedings of the National Academy of Science, 2004.
  • 9STUART G, DONALD G. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images [ J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984 (6) : 721-741.
  • 10CAO Juan, XIA Tian, LI Jintao, et al. A density-based method for adaptive LDA model selection [ J]. Neurocomputing, 2009 (72) : 1775-1781.

共引文献103

同被引文献459

引证文献31

二级引证文献294

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部