期刊文献+

基于改进朴素贝叶斯的区间不确定性数据分类方法 被引量:3

Classification method for interval uncertain data based on improved naive Bayes
下载PDF
导出
摘要 基于Parzen窗的朴素贝叶斯在区间不确定性数据分类中存在计算复杂度高、空间需求大的不足。针对该问题,提出一种改进的区间不确定性数据分类方法 IU-PNBC。首先采用Parzen窗估计区间样本的类条件概率密度函数(CCPDF);然后通过代数插值得到类条件概率密度函数的近似函数;最后利用近似代数插值函数计算样本的后验概率,并用于预测。通过人工生成的仿真数据和UCI标准数据集验证了算法假设的合理性以及插值点数对IUPNBC算法分类精度的影响。实验结果表明,当插值点数大于15时,IU-PNBC算法的分类精度趋于稳定,且插值点数越多,算法分类精度越高;该算法可以避免原Parzen窗估计对训练样本的依赖,并有效降低计算复杂度;同时由于该算法具有远低于基于Parzen窗的朴素贝叶斯的运行时间和空间需求,因此适合解决数据量较大的区间不确定性数据分类问题。 Considering the high computation complexity and storage requirement of Naive Bayes( NB) based on Parzen Window Estimation( PWE), especially for classification on interval uncertain data, an improved method named IU-PNBC was proposed for classifying the interval uncertain data. Firstly, Class-Conditional Probability Density Function( CCPDF) was estimated by using PWE. Secondly, an approximate function for CCPDF was obtained by using algebraic interpolation.Finally, the posterior probability was computed and used for classification by using the approximate interpolation function.Artificial simulation data and UCI standard dataset were used to assume the rationality of the proposed algorithm and the affection of the interpolation points to classification accuracy of IU-PNBC. The experimental results show that: when the interpolation points are more than 15, the accuracy of IU-PNBC tends to be stable, and the accuracy increases with the increase of the interpolation points; IU-PNBC can avoid the dependence on the training samples and improve the computation efficiency effectively. Thus, IU-PNBC is suitable for classification on large interval uncertain data with lower computation complexity and storage requirement than NB based on Parzen window estimation.
出处 《计算机应用》 CSCD 北大核心 2014年第11期3268-3272,共5页 journal of Computer Applications
基金 国家自然科学基金资助项目(41362015) 江西省自然科学基金资助项目(20122BAB201045)
关键词 区间不确定性数据 代数插值 朴素贝叶斯 Parzen窗估计 分类 interval uncertain data algebraic interpolation Naive Bayes(NB) Parzen Window Estimation(PWE) classification
  • 相关文献

参考文献15

  • 1AGGARWAL C C, YU P S. A survey of uncertain data algorithms and applications [J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(5): 609-623.
  • 2周傲英,金澈清,王国仁,李建中.不确定性数据管理技术研究综述[J].计算机学报,2009,32(1):1-16. 被引量:185
  • 3YANG J Q, GUNN S. Exploiting uncertain data in support vector classification[C] // KES 2007: Proceedings of the 11th International Conference on Knowledge-Based Intelligent Information and Engineering Systems, LNCS 4694. Berlin: Springer-Verlag, 2007: 148-155.
  • 4QIN B, XIA Y, PRABHAKAR S, et al. A rule-based classification algorithm for uncertain data[C] // ICDE 2009: Proceedings of the 25th IEEE International Conference on Data Engineering. Piscataway: IEEE Press, 2009,23(1): 1633-1640.
  • 5TSANG S, KAO B, YIP K Y, et al. Decision trees for uncertain data[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(1): 64-78.
  • 6李芳,李一媛,王冲.不确定数据的决策树分类算法[J].计算机应用,2009,29(11):3092-3095. 被引量:12
  • 7REN J, LEE S D, CHEN X L, et al. Naive Bayes classification of uncertain data [C] // ICDM 2009: Proceedings of the Ninth IEEE International Conference on Data Mining. Piscataway: IEEE Press, 2009: 944-949.
  • 8QIN B, XIA Y, WANG S, et al. A novel Bayesian classification for uncertain data [J]. Knowledge-Based Systems, 2011, 24(8): 1151-1158.
  • 9王双成,杜瑞杰,刘颖.连续属性完全贝叶斯分类器的学习与优化[J].计算机学报,2012,35(10):2129-2138. 被引量:38
  • 10颜伟,任洲洋,赵霞,余娟,李一铭,户秀琼.光伏电源输出功率的非参数核密度估计模型[J].电力系统自动化,2013,37(10):35-40. 被引量:60

二级参考文献142

  • 1李芳,韩元杰.基于证据理论的知识发现分类算法[J].桂林电子工业学院学报,2004,24(3):27-31. 被引量:1
  • 2金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 3谷峪,于戈,张天成.RFID复杂事件处理技术[J].计算机科学与探索,2007,1(3):255-267. 被引量:54
  • 4王成山,郑海峰,谢莹华,陈恺.计及分布式发电的配电系统随机潮流计算[J].电力系统自动化,2005,29(24):39-44. 被引量:288
  • 5史利民,王仁宏.NURBS曲线曲面拟合数据点的迭代算法[J].Journal of Mathematical Research and Exposition,2006,26(4):735-743. 被引量:22
  • 6Deshpande A, Guestrin C, Madden S, Hellerstein J M, Hong W. Model-driven data acquisition in sensor networks// Proceedings of the 30th International Conference on Very Large Data Bases. Toronto, 2004:588-599
  • 7Madhavan J, Cohen S, Xin D, Halevy A, Jeffery S, Ko D, Yu C. Web-scale data integration: You can afford to pay as you go//Proceedings of the 33rd Biennial Conference on Innovative Data Systems Research. Asilomar, 2007:342-350
  • 8Liu Ling. From data privacy to location privacy: Models and algorithms (tutorial)//Proceedings of the 33rd International Conference on Very Large Data bases. Vienna, 2007: 1429- 1430
  • 9Samarati P, Sweeney L. Generalizing data to provide anonymity when disclosing information (abstract)//Proeeedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Seattle, 1998:188
  • 10Cavallo R, Pittarelli M. The theory of probabilistic databases//Proceedings of the 13th International Conference on Very Large Data Bases. Brighton, 1987:71-81

共引文献297

同被引文献35

  • 1蒋盛益,谢照青,余雯.基于代价敏感的朴素贝叶斯不平衡数据分类研究[J].计算机研究与发展,2011,48(S1):387-390. 被引量:21
  • 2STRATE J D, LAPLANTE P A. A literature review of research in software defect reporting[J]. IEEE Transactions on Reliability, 2013, 62(2):444-454.
  • 3SHOKRIPOUR R, ANVIK J, KASIRUN Z M, et al. A time-based approach to automatic bug report assignment[J]. Journal of Systems & Software, 2015, 102:109-122.
  • 4SHOKRIPOUR R, ANVIK J, KASIRUN Z M, et al. Improving automatic bug assignment using time-metadata in term-weighting[J]. IET Software, 2014, 8(6):269-278.
  • 5ALENEZI M, MAGEL K, BANITAAN S. Efficient bug triaging using text mining[J]. Journal of Software, 2013, 8(9):2185-2190.
  • 6SHOKRIPOUR R, ANVIK J, KASIRUN Z M, et al. Why so complicated? Simple term filtering and weighting for location-based bug report assignment recommendation[C]//Proceedings of the 10th International Workshop on Mining Software Repositories. Piscataway, NJ:IEEE, 2013:2-11.
  • 7JEONG G, KIM S, ZIMMERMANN T. Improving bug triage with bug tossing graphs[C]//Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering. New York:ACM, 2009:111-120.
  • 8MATTER D, KUHN A, NIERSTRASZ O. Assigning bug reports using a vocabulary-based expertise model of developers[C]//Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. Piscataway, NJ:IEEE, 2009:131-140.
  • 9SHOKRIPOUR R, KASIRUN Z M, ZAMANI S, et al. Automatic bug assignment using information extraction methods[C]//Proceedings of the 2012 International Conference on Computer Science Applications and Technologies. Piscataway, NJ:IEEE, 2012:144-149.
  • 10MCCALLUM A, NIGAM K. A comparison of event models for naive Bayes text classification[C]//Proceedings of the 25th International Symposium on Computer and Information Sciences. Berlin:Springer, 1998:41-48.

引证文献3

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部