摘要
经典KNN算法和以往的基于密度的改进KNN算法都缺乏对训练样本数据分布的有效性描述,因此会间接影响到分类结果。提出一种基于测试样本近邻决策域内局部密度的改进KNN算法,通过计算各不同类别在近邻决策域内的局部密度,并同时考虑到类间偏斜度的存在,得到各类密度补偿系数和倾斜度平衡因子,从而达到削弱高数量、大密度类别,增强小数量、低密度类别的目的。在UCI数据集上的实验结果表明,该改进算法在保持经典KNN算法分类准确度的基础上,能够提高分类的召回率和F1-measure指标。
Both classical KNN algorithm and the previous density-based improved KNN algorithm lack effective description of distribution of training sample data.Therefore,it has bad influence on the classification results indirectly.An improved KNN algorithm based on partial distribution density within neighbors decision domain was proposed.The algorithm firstly computed the partial distribution density of different classes within neighbors decision domain.By taking class imbalance problem into consideration at the same time,density compensation coefficients and balance factors of different classes were gotten eventuelly.As a result,the importance of those classes which had higher partial density and larger numbers of samples was reduced,and those which had lower partial density and less numbers of samples increased.The experiments on UCI data sets indicated that the algorithm proposed could improve the classification indicators of recall and F1-measure on the basis of keeping a high classification accuracy of KNN algorithm.
出处
《科学技术与工程》
北大核心
2014年第30期57-61,共5页
Science Technology and Engineering
基金
国家自然科学基金(61164010)资助
关键词
KNN
局部密度
决策域
类偏斜
KNN partial density decision domain class imbalance problem