摘要
针对大规模数据分类中训练集分解导致的分类器泛化能力下降问题,提出基于训练集平行分割的集成学习算法.它采用多簇平行超平面对训练集实施多次划分,在各次划分的训练集上采用一种模块化支持向量机网络算法训练基分类器.测试时采用多数投票法对各个基分类器的输出进行集成.在3个大规模问题上的实验表明:在不增加训练时间和测试时间的条件下,集成学习在保持分类器偏置基本不变的同时有效减少了分类器的方差,从而有效降低了由于训练集分割导致的分类器泛化能力下降.
Aiming to handle the problem which generalization ability is decreased by partitioning training set, a machine learning algorithm was proposed to combine classifiers which are trained on training set partitioned by parallel hyperplanes. It used many clusters of parallel hyperplanes to partition training set on which each base classifier was trained by a SVM modular network algorithm and all these base classifiers were combined by majority voting strategy when testing. The experimental results on 3 large scale classification problems illustrate that ensemble learning can effectively reduce variance while keep bias and so cut down the descent of generalization ability but does not increase the training and test time.
出处
《小型微型计算机系统》
CSCD
北大核心
2009年第5期908-911,共4页
Journal of Chinese Computer Systems
基金
国家自然科学基金重点项目(60835004)资助
国家“八六三”计划项目(2007AA04Z244)资助
湖南省博士后科研资助专项计划项目(2008RS4005)资助
关键词
并行处理系统
学习系统
集成学习
parallel processing systems
learning systems
ensemble learning