摘要
针对卷烟焦油指标预测任务中历史卷烟数据样本具有小样本和高维度的特点,导致模型预测准确度偏低的问题,提出一种基于改进Wide&Deep的卷烟焦油指标预测模型。首先通过多个机器学习模型对数据样本进行预测,并将得到的结果作为模型新特征;然后将机器学习模型得到的新特征输入到Wide&Deep模型的Wide端,同时构建融合特征输入到Wide&Deep模型的Deep端,并在Deep端通过引入二阶特征和注意力机制构建注意力特征交叉层实现特征的高阶组合以提高模型预测的准确度。实验结果表明,所提模型与未经过改进的Wide&Deep模型相比,平均绝对误差(MAE)降低了23.4%,均方根误差(RMSE)降低了21.8%;与基于卷积神经网络提取特征的改进Wide&Deep模型相比,MAE降低了15.0%,RMSE降低了16.4%;有效提升了卷烟焦油指标预测任务的准确度。
Aiming at the problem that the historical cigarette data samples in the cigarette tar index prediction task have the characteristics of small sample and high dimension,which leads to the low prediction accuracy of the model,a cigarette tar index prediction model based on the improved Wide&Deep was proposed.First,the data samples were predicted through multiple machine learning models and the obtained results were used as new features of the model.Then the new features obtained by the machine learning models were input to the Wide side of the Wide&Deep model,the fusion features were constructed and input to the Deep side of the Wide&Deep model,and by introducing second-order features and attention mechanism to build an attention feature intersection layer,high-order combination of features were achieved to improve the accuracy of model prediction.Experimental results show that compared with the unimproved Wide&Deep model,the proposed model reduces Mean Absolute Error(MAE)by 23.4%and Root Mean Square Error(RMSE)by 21.8%;compared with the Wide&Deep model based on convolutional neural network for extraction features,the proposed model reduces MAE by 15.0%and RMSE by 16.4%.The proposed model effectively improves the accuracy of the cigarette tar index prediction task.
作者
周涛
谢立华
王啸飞
ZHOU Tao;XIE Lihua;WANG Xiaofei(Shifang Cigarette Factory,China Tobacco Sichuan Industry Limited Liability Company,Shifang Sichuan 618400,China;Information Center,China Tobacco Sichuan Industry Limited Liability Company,Chengdu Sichuan 610020,China;Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China)
出处
《计算机应用》
CSCD
北大核心
2023年第S01期95-99,共5页
journal of Computer Applications
基金
中国科学院西部青年学者项目(RRJZ2021003)