期刊文献+

基于共享背景主题的Labeled LDA模型 被引量:17

Labeled LDA Model Based on Shared Background Topics
下载PDF
导出
摘要 隐藏狄利克雷分配(Latent Dirichlet Allocation,LDA)模型被广泛应用于文本分析、图像识别等领域.但由于LDA及其扩展模型多为无监督学习模型,无法将其应用于分类任务中.本文通过研究文档标记与LDA模型中主题的映射关系,提出一种新的Labeled LDA模型(Shared Background Topics Labeled LDA,SBTL-LDA).在SBTL-LDA模型中每个标记除了存在若干个独享的局部主题外,还存在若干个共享的背景(Background)主题,这样可以有效分析不同标记所含主题之间的依赖关系,而文档标记被映射为局部主题和共享主题的组合,因此SBTL-LDA模型可以有效提升文档标记判别的准确性.同时SBTL-LDA模型还可以看成是一种半监督聚类模型,在对文档进行聚类分析的过程中模型可以有效的利用文档的标记信息提升文档聚类效果.实验证明SBTL-LDA模型能够有效解决PLDA模型中主题之间的相似性和依赖关系,具有良好的多标记判别能力,并且具有优于LDA、PLDA模型的文档聚类效果. LDA (Latent Dirichiet Allocation) is widely used in text analysis and images processing. However, LDA and most of its modifications are unsupervised learning models, which are not appropriate for classification especially multi-label classification problem. Through the study on the multi-label documents and LDA models, this paper proposes a new Labeled LDA model, namely Shared Background Topics Labeled LDA (SBTL-LDA) . In this new model, each label has not only a set of local topics, but also has several background (global) topics. Experienmental results show that SBTL-LDA can decrease the affect of similarities and de- pendence between different topics and because the label of document is mapped as a combination of local topics and shared topics, so it has a high accuracy when learning from multi-labeled documents.In addition,this model can be viewed as a semi-supervised clustering model which can utilize the information of labels and outperfom other models.
出处 《电子学报》 EI CAS CSCD 北大核心 2013年第9期1794-1799,共6页 Acta Electronica Sinica
基金 国家自然科学基金(No.71172219) 安徽省自然科学研究项目省级重点项目(No.KJ2011Z039 No.KJ2013A053)
关键词 隐藏狄利克雷分配 文本分析 多标记学习 半监督聚类 latent Dirichlet allocation text analysis multi-label learning semi-supervised clustering
  • 相关文献

参考文献16

  • 1王李冬,魏宝刚,袁杰.基于概率主题模型的文档聚类[J].电子学报,2012,40(11):2346-2350. 被引量:24
  • 2吴永辉,王晓龙,丁宇新,徐军,郭鸿志.基于主题的自适应、在线网络热点发现方法及新闻推荐系统[J].电子学报,2010,38(11):2620-2624. 被引量:29
  • 3Blei D M,Ng A Y,Jordan M I. Latent Dirichlet allocationJ J}. Machine Learning Research,2003,3:993 - 1022.
  • 4LaffertyJ D, Blei MD. Correlated topic models[AJ . Advances in Neural. Information Processing Systems, Proceedings of the 200'5 Cooferencel C]. Vancouver: Bradford Books,2IDU47 -155.
  • 5u W,McCallmn A.Pachinko allocation:DAG-structured mix?ture models of topic correlations[AJ . Proceedings of the 23rd International Conference on Machine Learningj C] . New York: ACM,2006.577 - 584.
  • 6D M Blei.J McAuliffe. Supervised topic modelsl A] . Advances in Neural Information Processing System[CJ. Vancouver, British Colmnbia Canada:Curran,2008.121- 128.
  • 7Ramage D, Hall D, Nallapati R, et al. Labeled IDA: A super?vised topic model for credit attribution in multi-labeled corpora[AJ. Proceedings of the 2009 Conference on Empirical Meth?ods in Natural Language Processing Association for Computa?tional linguistics[CJ . Singapore: Springer, 2009 . 248 - 256.
  • 8Ramage D ,Manning CD, Dumais S. Partially labeled topic models for interpretable text mining[A]. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[cJ . New York:ACM,2011.457 -465.
  • 9Hofmann T. Probabilistic latent semantic analysis[AJ . Proceedings of the FIfteenth Conference on Uncertainty in Artificial Intelli?gence[CJ . Morgan Kaufmann, San Mateo, CA: Morgan Kaufmann Publishers Inc, 1999.289 - 2%.
  • 10Minka T, Lafferty 1. Expectation-propagation for the genera?tive aspect model[AJ . Proceedings of the Eighteenth Confer?ence on Uncertainty in Artificial Intelligence[CJ . Morgan Kaufmann, San Mateo, CA: Morgan Kaufinann Publishers Inc, 2002 . 352 - 359.

二级参考文献25

  • 1孟涛,王继民,闫宏飞.网页变化与增量搜集技术[J].软件学报,2006,17(5):1051-1067. 被引量:22
  • 2Hafri Y,Djeraba C.High performance crawling system.In:Proc.of the 6th ACM SIGMM Int'1 Workshop on Multimedia Information Retrieval.New York:ACM Press,2004.299-360.
  • 3A Heydon,M Najork.Mercator:a scalable,extensible web crawler.International conference on World Wide Web.New York:ACM Press,1999.219-229.
  • 4Yan HF,Wang JY,Li XM,Guo L.Architectural design and evaluation of an efficient Web-crawling sysgem[J].Journal of Systems and Software.2002,60(3):185-193.
  • 5J Edwards,K McCurl,J Tomin.An adaptive model for optimizing performance of an incremental web crawler.International conference on World Wide Web.New York:ACM Press,2001.106-113.
  • 6J Cho,H Garcia-Molina.Effective page refresh policies for web crawlers.ACM Transactions on Database Systems.New York:ACM Press,2003.390-426.
  • 7Page L,Brin S,Motwani R.The PageRank Citation Ranking:Bring Oreder to the Web.Technical report,1998.
  • 8Feng G,Liu TY,Wang Y,et al.AggregateRank:bring order to web sites.Proceedings of the 29th annual international ACM SIGIR conference.New York:ACM Press,2006.75-82.
  • 9J Allan,J Carbonell,G Doddington.et al.Topic detection and tracking pilot study:Final report.In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop.San Fransisco:Morgan Kaufmann Press Ltd,1999.194-218.
  • 10D M Blei,A Y Ng,M I Jordan.Latent dirichlet allocation[J].J.Mach.Learn.Res.,2003,3(5):993-1022.

共引文献49

同被引文献182

引证文献17

二级引证文献139

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部