摘要
针对传统行业中传感器因周围环境导致数据丢失或缺失的困境,提出在未知数据分布情况下基于因果分析对能源系统中的多变量数据进行深度学习的方法,并利用其结果对缺失值进行补充。首先,对样本进行再均衡;其次,基于LSTM的多变量构建模型,利用因果分析优化其深度学习的优化器,去除在学习过程中不被期望的影响因子,削弱特征值与稳定偏转之间的伪相关,并结合安慰剂效应排除稳定偏转对特征值的影响;再次,将特征值与有害因子相减,获得去除有害因子的值进而优化模型得到更优结果。这种方法解决了在机器学习过程中对头部数据的欠拟合和对尾部数据的过拟合问题。在多变量的能源系统数据集中进行实验,结果表明,该方法在将缺失值插补收敛到真实值的问题上精度更高。
In view of the dilemma of sensor data loss or missing due to the surrounding environment in traditional industries,a deep learning method based on causal analysis for multivariate data in energy systems is proposed in the case of unknown data distribution,and the missing value is supplemented by the results.First of all,rebalance the samples,and then a model is built based on LSTM's multivariate model.Caus-al analysis is used to optimize the deep learning optimizer and remove the influence factors that are not expected in the learning process.The pseudo-correlation between the eigenvalue and stable deflection is weakened,and the influence of stable deflection on the eigenvalue is ex-cluded by placebo effect.Finally,the eigenvalue is subtracted from the harmful factor to obtain the value of removing the harmful factor,and then the model is optimized to obtain better results.This method solves the problem of underfitting the head data and overfitting the tail data in the process of machine learning.Experiments on multi-variable energy system data sets show that this method is more accurate in converging the missing value interpolation to the true value.
作者
房旭
FANG Xu(School of Computer Science and Technology,Zhejiang Sci-Tech University,Hangzhou 310018,China)
出处
《软件导刊》
2024年第1期103-107,共5页
Software Guide
基金
激光与物质相互作用国家重点实验室开发基础研究项目(SKLLIM2113)。
关键词
因果分析
神经网络
长尾分布
缺失值插补
causal analysis
neural network
long tail distribution
missing value interpolation