摘要
由于重复数据和空缺数据数量多,电力技改大修项目数据清洗方法不能实现脏数据的有效清洗,为此研究基于数据仓库ETL技术的电力技改大修项目数据清洗方法。通过对多数据源的电力技改大修项目脏数据质量的评估,判断符合预期标准后进行数据挖掘;结合数据仓库ETL技术对重复数据记录进行清洗;运用切比雪夫定理处理电力技改大修项目数据空缺值来完成对电力技改大修项目数据的有效清洗。实验结果表明,运用该方法清洗数据有效率最高,有效提高了数据的质量,实现了对数据的高质量清洗。
Due to the large number of duplicate data and vacant data,the data cleaning method of electric power technical transformation and overhaul project cannot achieve effective cleaning of dirty data.The data cleaning method of electric power technical transformation and overhaul project based on data warehouse ETL technology is studied.Through the evaluation of the quality of dirty data of the power technical transformation and overhaul project from multiple data sources,data mining was carried out after judging that it met the expected standards.Combined with data warehouse ETL technology,duplicate data records are cleansed.Chebyshev's theorem was used to deal with the data vacancy value of the power technical transformation overhaul project to complete the effective cleaning of the data.The experimental results show that this method has the highest efficiency of data cleaning,effectively improves the quality of data,and realizes high-quality data cleaning.
作者
沈海天
嵇惠方
游睿
唐梁
谢晓锋
SHEN Haitian;JI Huifang;YOU Rui;TANG Liang;XIE Xiaofeng(Zhejiang Huayun Power Engineering Design Consulting Co.,Ltd.,Zhejiang 310000,China)
出处
《电工技术》
2023年第14期177-179,共3页
Electric Engineering