摘要
高斯混合模型(Gaussian mixture model,GMM)是一种经典的概率模型,常被用于无监督学习领域来确定无类别标记样本点的类别分布。作为求解GMM参数的重要技术,期望最大化(Expectation maximization,EM)算法通过计算GMM对应似然函数的最优解确定基模型自身参数以及基模型的混合系数。利用EM算法求解GMM存在如下两个缺陷:EM算法易于陷入局部最优解以及EM算法确定GMM基模型相关参数的不稳定,尤其是针对多维随机变量。本文提出了一种基于统计感知(Statistical⁃aware,SA)策略的GMM求解方法——SA⁃GMM方法。该方法从估计给定数据集的未知概率密度函数入手,建立了核密度估计(Kernel density estimation,KDE)与GMM之间的关联。为避免KDE对“过平滑”窗口的选取,设计了同时最小化KDE与GMM之间的经验风险和KDE窗口结构风险的目标函数,进而确定了GMM的最优参数。在11个标准概率分布上的实验证明了SA⁃GMM方法的可行性、合理性和有效性,同时结果也表明SA⁃GMM能够获得显著优于基于EM算法的GMM及其变体的概率密度函数估计表现。
Gaussian mixture model(GMM)is a classic probability model,which is usually used in the field of unsupervised learning to determine the class distribution of unlabeled samples.As an important method for solving GMM parameters,the expectation-maximization(EM)algorithm determines the parameters and component coefficients by calculating the optimal solution of the GMM likelihood function.The use of EM algorithm to solve GMM has the following two defects:EM algorithm is prone to getting stuck in a local optimal solution,and the relevant parameters of the GMM basic model determined by the EM algorithm are unstable,especially for high-dimensional data.For this reason,this paper proposes a GMM solution method based on statistical-aware(SA)strategy,i.e.SA-GMM method.Starting from the estimation of the unknown probability density function of a given data set,the method establishes the correlation between kernel density estimation(KDE)technology and GMM.To avoid the selection of KDE’s over-smoothing bandwidth,the goal is to simultaneously minimize the empirical risk between KDE and GMM and the structural risk of KDE’s bandwidth.The experiments on 11 standard probability distributions confirm the feasibility,rationality,and effectiveness of SA-GMM.And it is also shown that the proposed SA-GMM method can obtain the better performance on probability density function estimation than EM-based GMM and its variant.
作者
陈佳琪
何玉林
黄哲学
FOURNIER-VIGER Philippe
CHEN Jiaqi;HE Yulin;HUANG Zhexue;FOURNIER-VIGER Philippe(College of Computer Science&Software Engineering,Shenzhen University,Shenzhen 518060,China;Guangdong Laboratory of Artificial Intelligence and Digital Economy(Shenzhen),Shenzhen 518107,China)
出处
《数据采集与处理》
CSCD
北大核心
2023年第3期525-538,共14页
Journal of Data Acquisition and Processing
基金
国家自然科学基金面上项目(61972261)
广东省自然科学基金面上项目(2023A1515011667)
深圳市基础研究重点项目(JCYJ20220818100205012)
深圳市基础研究面上项目(JCYJ20210324093609026)。
关键词
高斯混合模型
概率密度函数估计
统计感知
经验风险
结构风险
粒子群优化
Gaussian mixture model
probability density function estimation
statistical aware
empirical risk
structural risk
particle swarm optimization