摘要
基于Parzen窗的朴素贝叶斯在区间不确定性数据分类中存在计算复杂度高、空间需求大的不足。针对该问题,提出一种改进的区间不确定性数据分类方法 IU-PNBC。首先采用Parzen窗估计区间样本的类条件概率密度函数(CCPDF);然后通过代数插值得到类条件概率密度函数的近似函数;最后利用近似代数插值函数计算样本的后验概率,并用于预测。通过人工生成的仿真数据和UCI标准数据集验证了算法假设的合理性以及插值点数对IUPNBC算法分类精度的影响。实验结果表明,当插值点数大于15时,IU-PNBC算法的分类精度趋于稳定,且插值点数越多,算法分类精度越高;该算法可以避免原Parzen窗估计对训练样本的依赖,并有效降低计算复杂度;同时由于该算法具有远低于基于Parzen窗的朴素贝叶斯的运行时间和空间需求,因此适合解决数据量较大的区间不确定性数据分类问题。
Considering the high computation complexity and storage requirement of Naive Bayes( NB) based on Parzen Window Estimation( PWE), especially for classification on interval uncertain data, an improved method named IU-PNBC was proposed for classifying the interval uncertain data. Firstly, Class-Conditional Probability Density Function( CCPDF) was estimated by using PWE. Secondly, an approximate function for CCPDF was obtained by using algebraic interpolation.Finally, the posterior probability was computed and used for classification by using the approximate interpolation function.Artificial simulation data and UCI standard dataset were used to assume the rationality of the proposed algorithm and the affection of the interpolation points to classification accuracy of IU-PNBC. The experimental results show that: when the interpolation points are more than 15, the accuracy of IU-PNBC tends to be stable, and the accuracy increases with the increase of the interpolation points; IU-PNBC can avoid the dependence on the training samples and improve the computation efficiency effectively. Thus, IU-PNBC is suitable for classification on large interval uncertain data with lower computation complexity and storage requirement than NB based on Parzen window estimation.
出处
《计算机应用》
CSCD
北大核心
2014年第11期3268-3272,共5页
journal of Computer Applications
基金
国家自然科学基金资助项目(41362015)
江西省自然科学基金资助项目(20122BAB201045)
关键词
区间不确定性数据
代数插值
朴素贝叶斯
Parzen窗估计
分类
interval uncertain data
algebraic interpolation
Naive Bayes(NB)
Parzen Window Estimation(PWE)
classification