摘要
为抑制噪声数据对分类结果的影响,将噪声处理算法与高斯随机域算法相结合,提出一种带噪声系数的高斯随机域学习算法;针对样本集不平衡性数据分类问题,考虑主动学习在样本不平衡问题中的应用,将主动学习与图半监督算法相结合,提出一种鲁棒性强的主动学习图半监督分类算法。利用基于样本划分的主动学习方法,对正类的近邻样本集中样本与特定类样本形成的新样本集做总体散度排序,筛选出能使新样本集中总体散度最小的样本,代替正类的近邻样本集中所有样本,形成平衡类。在UCI标准数据集上的实验结果表明,与标准的图半监督算法相比,该算法的分类精度更高、泛化能力更强。
To prevent the influences of noise data on the result of classification,a semi-supervised classification algorithm based on graph was put forward,which was GRF with noise factor by combining noise processing algorithm with a Gaussian random fields algorithm.Meanwhile aiming at the imbalance of the sample set of data in classification and considering the application of active learning in the sample imbalance problem,a method of combining active learning with graph-based semi-supervised algorithm was also proposed and a robust classification algorithm was developed.Active learning method which based on the divided samples was used to do the overall divergence sort on the new sample set which formed by combining the neighborhood sample of positive class with the sample of a specific class.Then the sample set were filtered out which made the new sample set have minimum divergence to substitute all samples of the neighborhood sample set of positive class to form balanced classes.Experimental results on UCI standard data sets show that compared with the standard graph-based semi-supervised algorithm,this method has higher classification accuracy and better generalization ability.
出处
《计算机工程与设计》
北大核心
2015年第7期1871-1875,共5页
Computer Engineering and Design
基金
北京市自然科学基金B类重点基金项目(KZ201410011014)