摘要
提出一种基于种子文档的LDA话题演化方法。首先选取种子文档,利用种子文档指导后一时间段文档的建模,然后根据种子文档的语义分布信息对连续时间上的LDA话题进行关联,保证话题的同一性。实验结果证明,在NIPS论文语料集和全国两会新闻报道集中,该方法可以推导特定话题的演化结果,避免关联话题之间存在的演化结果。
This paper presents a new method to infer the LDA topic evolution automatically based on seminal documents. The semantic distribution of the seminal documents is used to guide the successive model and link topics between consecutive time slices. The experiments are based on NIPS dataset and Chinese newswire of NPC and CPPCC, and the results show that the method can not only get the correct evolutions in various forms, but also avoid those related topics without evolution relationship.
出处
《现代图书情报技术》
CSSCI
北大核心
2011年第7期104-109,共6页
New Technology of Library and Information Service
基金
国家自然科学基金项目"新闻话题线索与主题的探测研究"(项目编号:60873134)的研究成果之一