摘要
容错调度是调度问题中一个重要的研究内容,是提高系统可靠性的有效手段.目前已有很多集群系统中实时任务的容错调度算法,但是这些算法都没有考虑到任务的QoS需求问题.提出了一种异构集群系统中具有QoS需求的实时任务容错调度算法FTQ(fault-tolerant QoS-based scheduling).该算法采用主版本/副版本(primary/backup,简称PB)技术,综合考虑了任务的时间限制、任务的QoS需求、系统的可靠性和系统资源的利用率,能够自适应地根据系统负载情况动态地调整任务的QoS级别和副版本的执行模式,从而提高了系统的灵活性、可靠性、可调度性和资源的利用率.对系统的可靠性进行了定量分析,并将其引入到容错调度算法中,提高了系统的可靠性.同时,在调度过程中尽量提前主版本的开始时间,推迟副版本的开始时间,以使任务的副版本采用被动执行模式或者使任务主版本和副版本的重叠部分尽量少,提高了资源的利用率.此外,采用了副版本重叠技术,并分析了副版本的最晚开始时间及其约束条件,提高了任务的调度成功率.通过大量的模拟实验,对FTQ,NOFTQ和DYFARS算法进行了比较.实验结果表明,FTQ算法的性能优于其他方法,具有更好的调度质量.
Fault-Tolerant scheduling, an effective means of improving a system's performance, plays a significant role in scheduling research. Despite the fact that fault-tolerant scheduling has been extensively proposed for real-time tasks on clusters, QoS (quality of service) requirements for some tasks have not been considered. This paper proposes a fault-tolerance scheduling algorithm FTQ (fault-tolerant QoS-based scheduling) for real-time tasks with QoS needs on heterogeneous clusters. FTQ adopts the primary/backup model and takes the timing constraints of tasks, QoS requirements of tasks, reliability of systems, and system resource utilization into account. FTQ can adjust the QoS levels of real-time tasks and the execution schemes of backup copies to improve system flexibility, reliability, schedulability, and resource utilization. The system reliability is quantitatively measured and combined into FTQ, which improves the system performance. Meanwhile, FTQ strives to advance the start time of primary copies and delay the start time of backup copies to make backup copies adopt passive execution scheme, or decrease overlapping sections of primary and backup copies as much as possible to improve resource utilization. FTQ adaptively adjusts the QoS levels of tasks and the execution schemes of backup copies to attain a higher flexibility. The overlapping technology of backup copies is employed. The latest start time of backup copies and its constraints are analyzed. Compared with NOFTQ and DYFARS, FTQ shows obvious superiority with a higher scheduling quality proven by a considerable number of simulated experiments.
出处
《软件学报》
EI
CSCD
北大核心
2011年第7期1440-1456,共17页
Journal of Software
基金
国家安全重大基础研究计划(973)(6136101)
国家高技术研究发展计划(863)(2008AA7070412)
关键词
异构集群
实时
调度
容错
启发式
heterogeneous cluster
real-time
scheduling
fault tolerance
heuristic