摘要
任务执行时间估计是云数据中心环境下工作流调度的前提.针对现有工作流任务执行时间预测方法缺乏类别型和数值型数据特征的有效提取问题,提出了基于多维度特征融合的预测方法.首先,通过构建具有注意力机制的堆叠残差循环网络,将类别型数据从高维稀疏的特征空间映射到低维稠密的特征空间,以增强类别型数据的解析能力,有效提取类别型特征;其次,采用极限梯度提升算法对数值型数据进行离散化编码,通过对稠密空间的输入向量进行稀疏化处理,提高了数值型特征的非线性表达能力;在此基础上,设计多维异质特征融合策略,将所提取的类别型、数值型特征与样本的原始输入特征进行融合,建立基于多维融合特征的预测模型,实现了云工作流任务执行时间的精准预测;最后,在真实云数据中心集群数据集上进行了仿真实验.实验结果表明,相对于已有的基准算法,该方法具有较高的预测精度,可用于大数据驱动的云工作流任务执行时间预测.
Task runtime estimation is a prerequisite for workflow scheduling in cloud data centers.However,the existing runtime prediction methods for workflow activities fail to effectively extract categorical and numerical features.In this paper,we propose a multi-dimensional feature fusion-based runtime prediction approach for workflow tasks.Firstly,we construct a stacked residual recurrent neural network with an attention mechanism for mapping categorical data from high-dimensional sparse space to low-dimensional dense space so as to enlarge its capability of parsing categorical data for categorical feature extraction.Secondly,extreme gradient boosting is introduced to discretize the numerical data and enhance the nonlinear representation capability for numerical features through sparsely processing the input vectors within dense space.Thirdly,we design a heterogeneous multi-dimensional feature fusion strategy,and then blend the extracted features with original inputs to mine comprehensive knowledge for runtime prediction.Finally,based on the resulting multi-dimensional fused features,a prediction model is developed to fully utilize these features as well as its corresponding hidden knowledge and then to forecast the runtimes accurately for cloud workflow tasks.To verify the effectiveness and superiority of the proposed method,we conduct extensive experiments on a cluster dataset from a real cloud data center.The experimental results show that,our approach outperforms the existing algorithms and can be applied in big data-driven runtime prediction for workflow activities in the cloud.
作者
李慧芳
黄姜杭
徐光浩
夏元清
LI Hui-Fang;HUANG Jiang-Hang;XU Guang-Hao;XIA Yuan-Qing(Key Laboratory of Intelligent Control and Decision of Complex Systems,Beijing Institute of Technology,Beijing 100081)
出处
《自动化学报》
EI
CAS
CSCD
北大核心
2023年第1期67-78,共12页
Acta Automatica Sinica
基金
国家重点研发计划(2018YFB1003700)
国家自然科学基金(61836001)资助。
关键词
云数据中心
工作流
集成学习
特征融合
执行时间预测
Cloud data centers
workflows
ensemble learning
feature fusion
execution time prediction