摘要
针对KNN算法在大数据离群点检测领域中难以处理高维数据和时间复杂度过高的这2个缺点,提出一种基于AOR(属性重叠率)的分类方法,并对KNN算法进行改进。首先对数据进行基于AOR的降维处理,使得数据可处理维度大大增加,然后对传统的KNN算法进行剪枝改进,减少了大量的无效计算。实验结果表明,本文算法对维度高、容量大的大数据样本在运行效率、准确度等方面有较大的提升。
Aiming at the two shortcomings of KNN algorithm in the field of large data outlier detection, high dimension data is difficult to deal with and time complexity is too high. A classification method based on AOR (Attribute Overlapping Rate) is proposed, and the KNN algorithm is improved. At first the data were reduced the dimension based on AOR, making data processing dimension great increase. Then the traditional KNN algorithm was improved by pruning, reducing lots of invalid computation. The experimental results show that this algorithm has a great improvement on the operational efficiency and accuracy of the large data samples with high dimension and large capacity.
出处
《计算机与现代化》
2017年第5期67-70,75,共5页
Computer and Modernization
基金
四川省科技厅科技支撑计划项目(2013GZ0141)
关键词
大数据
KNN
降维
属性重叠率
剪枝
big data
KNN
reduce dimension
attribute overlapping rate
pruning