摘要
对于线性回归中k NN(k-Nearest Neighbor)算法的k值固定问题和训练样本中的噪声问题,提出一种新的基于重构的稀疏编码方法。该方法用训练样本重构每一个测试样本,重构过程中,l_1-范数被用来确保每个测试样本被不同数目的训练样本来预测,以此解决kNN算法固定k值问题;l_(2,1)-范数导致的整行稀疏被用来去除噪声样本,以避免数据集上的噪声对重构产生不利影响。实验在UCI数据集上显示:新的改进算法比原来的kNN算法在线性回归中具有更好的预测效果。
This paper proposes a new reconstruction-based sparse coding method for solv{ng the problem of k value fixing of k-NN in linear regression and the problem of noise in training samples. The method reconstructs every test sample using training sample. In reconstruction process,the 11 -norm is used to ensure each test sample will be predicted by training sample in different numbers and thus to solve the problem of k-NN algorithm in fixing k value, and the entire row sparse incurred by 12,1 -norm is used to remove noise samples so as to prevent the noise in dataset from adverse impact on the reconstruction. Experimental results on UCI datasets show that the new improved algorithm outperforms the previous k-NN regression method in terms of prediction effect.
出处
《计算机应用与软件》
CSCD
2016年第2期232-236,241,共6页
Computer Applications and Software
基金
国家自然科学基金项目(6117013161263035
61363009)
国家高技术研究发展计划项目(2012AA011005)
国家重点基础研究发展计划项目(2013CB329404)
广西自然科学基金项目(2012GXNSFGA06004)
广西研究生教育创新计划项目(YCSZ2015095
YCSZ2015096)
广西多源信息挖掘与安全重点实验室开放基金项目(MIMS13-08)