摘要
为解决传统聚类算法对大数据背景下高维海量、类簇形状差异巨大的电力负荷曲线进行聚类分析时存在的聚类结果不稳定、聚类效果较差、聚类速度慢和内存消耗过大等问题,提出一种改进的快速密度峰值聚类算法。首先应用主成分分析法对归一化后的负荷曲线集进行降维处理,以减少样本向量间欧式距离的计算量和加快后续操作。然后利用kd树算法对降维后的数据进行快速K近邻搜索生成KNN矩阵。最后以KNN矩阵代替原算法的距离矩阵作为输入数据。在基于KNN改进的样本局部密度和距离计算准则的基础上,运用快速密度峰值算法对负荷曲线进行聚类分析。通过实验和算例分析验证了所提改进算法的实用性和有效性。
Aiming at the problems of poor stability of clustering results,poor effectiveness in clustering,slow speed and high memory consumption when making traditional clustering analysis for a large dimensionality huge number of load profiles with huge difference between the clusters under the background of the big data,an improved density peaks clustering algorithm is proposed.Firstly,principle components analysis method is used to reduce dimensions of load curves after normalization in order to reduce the calculation of the Euclidean distance between the sample vectors and to speed up the subsequent operations.Then,the kd tree algorithm is used to carry out the fast k-nearest neighbor search to generate KNN matrix.Finally,the KNN matrix is used to replace the original distance matrix as the input data.Based on the KNN improved local density and distance calculation criterion,the density peaks clustering algorithm is used to cluster the load profiles.Experiments and case analysis show that the proposed method is practicable and effective.
作者
陈俊艺
丁坚勇
田世明
卜凡鹏
朱炳翔
黄事成
周凯
CHEN Junyi;DING Jianyong;TIAN Shiming;BU Fanpeng;ZHU Bingxiang;HUANG Shicheng;ZHOU Kai(School of Electrical Engineering,Wuhan University,Wuhan 430072,China;China Electric Power Research Institute,Beijing 100192,China)
出处
《电力系统保护与控制》
EI
CSCD
北大核心
2018年第20期85-93,共9页
Power System Protection and Control
基金
国家高技术研究发展计划(863计划)(2015AA050203)
国家电网公司科技项目"智能配用电大数据应用关键技术深化研究"~~