摘要
对聚类分析中的模型选择特别是混合模型方法进行了较全面地介绍与总结 ,对其中的关键技术逐一进行了讨论 .在此基础上 ,提出了贝叶斯后验模型选择方法 ,并把它与文档产生特征序列的物理模型相结合 ,给出了一个用于聚类分析的概率模型 .对真实文本数据的测试中该模型取得了非常好的效果 .
A complete introduction to the model selection, ad hoc the mixture model, for clustering analysis is included in this paper, and the key related technologies are discussed seriatim, Based on these, the author introduces the Bayesian posteriori model selection, which reduces the complexity of the algorithm based on the mixture model and improves the precision (against the traditional model selection). To estimate the parameters in the posteriori model, two different Bayesian estimation methods, maximum likelihood estimation, and conditional expectation estimation, are compared. The posteriori model based hierarchical clustering algorithms are described, with the analysis of the domain itself. Results of high accuracy have been achieved in experiments for real world text clustering.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2002年第5期580-587,共8页
Journal of Computer Research and Development
关键词
文本聚类
贝叶斯后验模型选择
混合模型
贝叶斯估计
人工智能
text clustering, Bayesian posteriori model selection, mixture model, expectation maximization, Bayesian estimation