期刊文献+

文本聚类中的贝叶斯后验模型选择方法 被引量:21

BAYESIAN POSTERIORI MODEL SELECTION FOR TEXT CLUSTERING
下载PDF
导出
摘要 对聚类分析中的模型选择特别是混合模型方法进行了较全面地介绍与总结 ,对其中的关键技术逐一进行了讨论 .在此基础上 ,提出了贝叶斯后验模型选择方法 ,并把它与文档产生特征序列的物理模型相结合 ,给出了一个用于聚类分析的概率模型 .对真实文本数据的测试中该模型取得了非常好的效果 . A complete introduction to the model selection, ad hoc the mixture model, for clustering analysis is included in this paper, and the key related technologies are discussed seriatim, Based on these, the author introduces the Bayesian posteriori model selection, which reduces the complexity of the algorithm based on the mixture model and improves the precision (against the traditional model selection). To estimate the parameters in the posteriori model, two different Bayesian estimation methods, maximum likelihood estimation, and conditional expectation estimation, are compared. The posteriori model based hierarchical clustering algorithms are described, with the analysis of the domain itself. Results of high accuracy have been achieved in experiments for real world text clustering.
作者 姜宁 史忠植
出处 《计算机研究与发展》 EI CSCD 北大核心 2002年第5期580-587,共8页 Journal of Computer Research and Development
关键词 文本聚类 贝叶斯后验模型选择 混合模型 贝叶斯估计 人工智能 text clustering, Bayesian posteriori model selection, mixture model, expectation maximization, Bayesian estimation
  • 相关文献

参考文献13

  • 1[1]H H Bock.Probabilistic models in cluster analysis.Computational Statistics & Data Analysis,1996,23:5~28
  • 2[2]Chris Fraley,Adrian E Raftery.Model-based clustering,discriminate analysis,and density estimation.Department of Statistics,University of Washington,Tech Rep:380,2000
  • 3[3]Petri T Kontkanen,Petri J Myllymaki,Henry R Tirri.Comparing Bayesian model class selection criteria by discrete finite mixtures.In:D L Dowl,K B Korb,J J Oliver eds.Information,Statistics and Induction in Science (Proc of the ISIS'96 Conf in Melbourne.Australia,1996).Singapore:World Scientific,1996.364~374
  • 4[4]An Introduction to Cluster Analysis for Data Mining.http://www.cs.umn.edu/classes/Spring-2000/csci5980-dm/cluster-survey.pdf
  • 5[5]高等数理统计.超星数字图书馆.http://www.ssreader.com.cn.442~444(Advanced Mathematical Statistics (in Chinese),Superstar Digital Library.http://www.ssreader.com.cn.442~444)
  • 6[6]Jeff A Bilmes.A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models.Computer Science Division Department of Electrical Engineering and Computer Science,U C Berkeley,Tech Rep:TR-97-021,1998
  • 7[7]R E Kass,A E Raftery.Bayesian factors and model uncertainly.Department of Statistics,Carnegie-Mellon University,Tech Rep:571,1993
  • 8[8]I J Good.Weight of evidence:A brief survey.In:J M Bernade ed.Bayesian Statistics 2.New York:Elsevier,1985.249~269
  • 9[9]贝叶斯统计推断.超星数字图书馆.http://www.ssreader.com.cn(Bayesian Inferential Statistics (in Chinese).Superstar Digital Library.http://www.ssreader.com.cn)
  • 10[10]P Cheeseman,J Stutz.Bayesian Classification (AutoClass):Theory and results.In:U M Tayyad ed.Knowledge Discovery in Data Bases II.AAAI Press /The MIT Press,1995.153~180

同被引文献125

引证文献21

二级引证文献133

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部