期刊文献+

基于LDA2Vec联合训练的热点主题识别方法 被引量:3

Joint training hot topic recognition method based on LDA2Vec
下载PDF
导出
摘要 针对传统的主题模型算法没有充分利用词间语义关系和上下文语境而导致主题语义一致性、可解释性差的问题,给出一种基于LDA2Vec主题模型联合训练的热点主题识别方法——NS-LDA2Vec方法。该方法通过扩展Skip-gram模型,将初始化后的文档向量和枢轴词向量联合训练,以获得上下文向量,然后利用该向量来预测中枢词的上下文单词,从而将主题信息嵌入到词表示和文档表示中,使得预测过程中负采样损失和Dirichlet似然项总和最小化,产生可解释性更好的文本表示。结果表明:所提方法取得的F1值最高可达到0.898,在热点主题分类任务上,相比传统的LDA主题模型,主题相关度提升了约9%,能够有效提升主题识别任务的效果。 The traditional topic model algorithm does not make full use of the semantic relationship between words and the context,which leads to the inconsistency of topic semantics and poor interpretability.A hot topic recognition method based on the joint training of the LDA2Vec topic model(NS-LDA2Vec)was thus proposed.This method expanded the Skip-gram model to jointly train the initialized document vector and pivoted word vector to obtain the context vector,and then used the vector to predict the context word of the pivot word,thereby embedding topic information into the word representation and document in the representation;the sum of the negative sampling loss and the Dirichlet likelihood term in the prediction process was minimized,which resulted in a better interpretable text representation.The results show that the F1 value obtained by the proposed method can reach up to 0.898.Compared with the traditional LDA topic model,the topic relevance is improved by about 9%on the hot topic classification task,which can effectively improve the effect of the topic recognition task.
作者 薛涛 郭莹 胡伟华 XUE Tao;GUO Ying;HU Weihua(School of Computer Science, Xi’an Polytechnic University, Xi’an 710048, China;School of Humanities and Social Science, Xi’an Polytechnic University, Xi’an 710048, China)
出处 《西安工程大学学报》 CAS 2021年第4期95-101,共7页 Journal of Xi’an Polytechnic University
基金 国家社会科学基金(18XYY010)。
关键词 LDA2Vec 文档向量 词向量 主题模型 热点主题识别 LDA2Vec document vector word vector topic model hot topic recognition
  • 相关文献

参考文献8

二级参考文献74

  • 1陈玉霞.基于诺兰模型的图书馆文献信息资源共享系统建设的分析[J].图书馆学研究,2005(5):61-63. 被引量:3
  • 2Liu B, Zhang L. A survey on opinion mining and sentiment analysis // Mining text data. New York: Springer, 2012:415-463.
  • 3Taboada M, Brooke J, Tofiloski M, et al. Lexicon-based methods for sentiment analysis. Computational Linguistics, 2011, 37(2): 267-307.
  • 4Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. Journal of Machine Leanming Research, 2003(3): 993-1022.
  • 5Titov I, McDonald R. Modeling online reviews with multi-grain topic models // Proceeding of WWW'08. New York: ACM, 2008:111-120.
  • 6Titov I, McDonald R. A joint model of text and aspect ratings for sentiment summarization//Proceedings of ACL-08: HLT. Stroudsburg: ACL, 2008:308-316.
  • 7Zhao X, Jiang J, Yan H F, et al. Jointly modeling aspects and opinions with a MaxEnt-LDA hybrid // Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Strouds- burg: ACL, 2010:56-65.
  • 8Brody S, Elhadad N. An unsupervised aspect- sentiment model for online reviews//Proceedings of the 2010 Annual Conference of the North American Chapter of the ACL. Stroudsburg: ACL, 2010: 804-812.
  • 9Jo Y, Oh A. Aspect and sentiment unification mode for online review analysis // Proceedings of the 4th ACM International conference on Web search and data mining. New York: ACM, 2011 : 815-824.
  • 10Lin C H, He Y L. Joint sentiment/topic model for sentiment analysis // Proceeding of the 18th ACM conference on Information and knowledge mana- gement. New York: ACM, 2009:375-384.

共引文献158

同被引文献27

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部