摘要
针对电力大数据流的异常检测问题,该文将流数据聚类算法与电力大数据相结合,针对现有流数据聚类算法不易存储全部数据、断电数据易丢失等问题,以及流数据聚类算法对于离线阶段聚类算法实时应答的要求,从数据的完整性、安全性以及流数据聚类算法的低时间复杂度的角度出发,对CluStream流数据聚类算法进行改进,提出流式K-means聚类算法。对在线阶段,使用Redis集群进行流数据的缓冲,并设计节点时间衰减策略,增大心跳消息中有效消息所占比例;对离线阶段聚类算法进行优化,使用最佳距离法确定初始聚类中心,减少迭代次数;最后,使用所提出的流式K-means聚类算法进行用户用电异常行为检测,实验结果表明,该算法能够很好的发现用户用电异常行为。
To solve the abnormal value detection of power big data stream,this paper combines stream data clustering algorithm with power big data,and the problem that the existing stream data clustering algorithm is easy to lose data,can’t store all data,and lack of real-time response of the stream data clustering algorithm’s offline stage clustering algorithm.From the perspective of data security,integrity and low time complexity of stream data clustering algorithms,this paper improves the CluStream stream data clustering algorithm and proposes a streaming K-means clustering algorithm.For the online phase,the Redis cluster is used to buffer the stream data,and the node time decay strategy is designed to increase the proportion of valid messages in the heartbeat message;The offline partial clustering algorithm is optimized,and the optimal clustering method is used to determine the initial clustering center and reduce the number of iterations.Finally,the proposed K-means clustering algorithm is used to detect the abnormal customer electricity behavior.The experimental result shows that the algorithm can well detect the abnormal behavior of customers.
作者
于小青
齐林海
YU Xiaoqing;QI Linhai(School of Control and Computer Engineering,North China Electric Power University,Beijing 102206,China)
出处
《电力信息与通信技术》
2020年第3期8-14,共7页
Electric Power Information and Communication Technology
基金
国家电网公司科技项目资助“城市电网电能质量大数据深化分析及应用技术研究”(52094018001C)。
关键词
电力大数据
流数据聚类
流式K-means聚类
用户用电异常
power big data
stream data clustering
streaming K-means clustering
abnormal customer electricity behavior