摘要
为提高水文数据质量和数据校核效率,提出了一种基于CART决策树的水位流量序列数据异常值检测方法,以提高异常值自动检测水平。首先构造纳入冗余数据样本,再利用水位流量序列数据特性选择检测特征,然后将样本带入特征,结合测站特性和基尼系数算法计算特征阈值,形成决策树,最后利用决策树进行异常值检测,辅助人工数据校核。经过人工判定后的数据,将形成新的样本用于下一次决策树构建。该方法的检测精度和效率能满足实际应用,目前已在全国水文水资源监测信息系统使用。
In order to meet the needs of water resources management,and improve the quality of hydrological data and the efficiency of data review,the paper proposed a fault detection method for water level and discharge sequence data based on CART.Firstly,it built samples that introduce redundant data.Then,it selected the detection features about the characteristics of water level and discharge sequence data.Next,the samples were taken into the characteristics to calculate the characteristic thresholds by combining the station characteristics and gini coefficient algorithm,and a decision tree was formed.Finally,decision tree was used to detect fault and assist manual data checking.After manual judgment,new samples will be formed for the next decision tree construction.Experimental results show that the detection accuracy of this method can meet the practical application.This method has been used in the national hydrological and water resources monitoring information system.
作者
李珏
沈鹏
陈雅莉
LI Jue;SHEN Peng;CHEN Yai(Bureau of Hydrology,Changjiang Water Resources Commission,Wuhan 430010,China;The 95791 Unit of PLA,Jiuquan 735001,China)
出处
《水文》
CSCD
北大核心
2024年第1期57-62,共6页
Journal of China Hydrology