摘要
潜在狄利克雷分配(LDA)主题模型可用于识别大规模文档集中潜藏的主题信息,但是对于微博短文本的应用效果并不理想。为此,提出一种基于LDA的微博用户模型,将微博基于用户进行划分,合并每个用户发布的微博以代表用户,标准的文档-主题-词的三层LDA模型变为用户-主题-词的用户模型,利用该模型进行用户推荐。在真实微博数据集上的实验结果表明,与传统的向量空间模型方法相比,采用该方法进行用户推荐具有更好的效果,在选择合适的主题数情况下,其准确率提高近10%。
Latent Dirichlet Allocation(LDA) model can be used for identifying topic information from large-scale document set, but the effect is not ideal for short text such as microblog. This paper proposes a microblog user model based on LDA, which divides microblog based on user and represents each user with their posted microbolgs. Thus, the standard three layers in LDA model by document-topic-word becomes a user model by user-topic-word. The model is applied to user recommendation. Experiment on real data set shows that the new provided method has a better effect. With a proper topic number, the performance is improved by nearly 10%.
出处
《计算机工程》
CAS
CSCD
2014年第5期1-6,11,共7页
Computer Engineering
基金
国家科技支撑计划基金资助项目(2013BAH21B00)
北京市自然科学基金资助项目(4123091)
北京市属高等学校人才强教深化计划基金资助项目"中青年骨干人才培养计划"(PHR20110815)