期刊文献+

运用Chi2算法的一种变形简化决策树归纳的实例表示空间 被引量:2

Using a Variation of Chi2 to Simplify the Case Representation Space for Decision Tree Induction
下载PDF
导出
摘要 决策树归纳的两个重要阶段是数据表示空间的简化和决策树的生成。在将训练集的不一致率控制在某一阈值的前提下,减少实例的属性个数和各个属性的取值个数保证了决策树方法的可行性和有效性。本文在Chi2算法的基础上运用它的一种变形进行属性取值离散化和属性筛选,然后运用算术运算符合并取值个数为2或3的相邻属性。在此基础上生成的决策树具有良好的准确性。实验数据采用的是一个保险公司捐献的数据集。 The simplification of training dataset representation and the generation of decision trees are two critical phases in decision tree induction. On the condition of bringing the inconsistency rate under a threshold, reducing the attribute number and the different value number of each attribute assures the feasibility and effectiveness of the decision tree learning method. In this paper, a variation of the Chi2 algorithm is proposed to perform attribute discretization and selection. The decision tree generated in the further steps offers a good classification accuracy. Our experiment is based on a data set donated by an insurance company from the real world.
作者 徐计 张桂芸
出处 《计算机工程与科学》 CSCD 2007年第10期47-49,共3页 Computer Engineering & Science
基金 天津市自然科学基金资助项目(033610811) 天津市"十五"教育科学规划重点课题(YSO17)
关键词 决策树 Chi2的变形 离散化 筛选 decision tree variation of Chi2 discretization selection
  • 相关文献

参考文献8

  • 1Liu Huan, Setiono R. Chi2 : Feature Selection and Discretization of Numerie Attributes[A]. Proe of the IEEE 7th Int'l Conf on Tools with Artifieial Intelligenee[C]. 1995.
  • 2Tay F E H. A. Modified Chi2 Algorithm for Diseretization [J]. IEEE Trans on Knowledge and Data Engineering, 2002, 14 (3):666-670.
  • 3Breslow L A,Aha D W. Simplifying Decision Trees:A Survey [J]. Knowledge Engineering Review,1997,12(1): 1-40.
  • 4Witten L H,Frank E.数据挖掘——实用机器学习技术(英文版,第二版)[M].北京:机械工业出版社.2005.
  • 5Bloedom E,Michalski R S. The AQ17-DCI System for DataDriven Constructive Induction and Its Application to the Analysls of World Economics[A]. Proc of the 9th Int'l Syrup on Methodologies for Intelligent Systems[C]. 1996,
  • 6孙细明,张晓鹏.基于信息熵的决策树算法实现[J].计算机与数字工程,2005,33(11):94-95. 被引量:11
  • 7仇春光,刘玉树.自动生成决策树的通用算法模板[J].北京理工大学学报,1999,19(3):338-342. 被引量:5
  • 8李艾华,屈梁生.改进的决策树生成算法及条件决策表的创建[J].西安交通大学学报,1999,33(10):43-47. 被引量:2

二级参考文献4

  • 1屈梁生,机械故障诊断学,1986年,156页
  • 2李德毅.数据挖掘研究现状[EB/OL].http://seekjob.myrioce.com/dm-3.htm,2000-11-16.
  • 3范盟 孟小峰译 JiaweiHan MichelineKamber著.数据挖掘概念与技术[M].北京:机械工业出版社,2001..
  • 4俞文彬,谢康林,张忠能.基于属性分类的数据挖掘方法[J].小型微型计算机系统,2000,21(3):305-308. 被引量:14

共引文献15

同被引文献10

  • 1Heckerman D,Geiger D, Chickering D M. Learning Bayesian Networks: The Combination of Knowledge and Statistical Data[R]. Technical Report MSR TR-94 09, Microsoft Research, 1994.
  • 2Hekerman D. A Tutorial on I.earning with Bayesian Network[R]. Technical Report MSR-tr-95 06, Microsoft Research, 1996.
  • 3Cheng Jie, Bell D A, Liu Weiru. l.earning Belief Networks from Data: An Information Theory Based Approach [C]// Proc of the 6th Int'l Conf on Information and Knowledge Management, 1997 : 325-331.
  • 4Cheng J, Bell D A, Liu W. An Algorithm for Bayesian Belief Network Construction from Data[C]//Proc of AI & STAT' 97,1997 : 83-90.
  • 5Williams C K I, Feng X. Combining Neural Networks and Belief Networks for Image Segmentation[C]//Proc of IEEE Signal Proc Society Workshop on Neural Networks for Signal Processing, 1998.
  • 6Argamon-Engelson S,Dagan I. Committee-Based Sample Selection for Prohabilistic Classifiers[J]. Journal of Artificial Intelligence Research, 1999,11: 335- 460.
  • 7Blum A,Mitchell T. Comhining Labeled and Unlabeled Data with Co-Training[C]//Proc of the 11th Annual Conf on Computational Learning Theory, 1998: 92-100.
  • 8Amari S. Mathematical Foundations of Neurocomputing[J]. Proceedings of the IEEE,1990,78(9) : 1443-1463.
  • 9Cooper G. Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks(Research note)[J]. Artificial Intelligence, 1990,42 (2,3) : 393-405.
  • 10Chatteriee S, Hadi A S, Price B. Regression Analysis By Example (Third Edition) [M]. John Wiley &. Sons, Inc, 2000.

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部