摘要
分析了集群作业系统Condor中的检查点机制的实现原理,针对其在执行检查点操作时存在的对计算资源和存储资源的利用的不足,提出了使用增量检查点技术对其进行优化的方案,使得每次进行检查点操作时,只需要保存同上次检查点操作相比改变了的数据,这样可以大大降低检查点操作的空间开销和时间开销,最后通过1个矩阵运算的实例验证了改进方案的可行性.
This paper analyzes the implementation details of checkpoint mechanism in the Condor system,points out the shortages in compute resources and storage resources utilization of checkpoint operation,and proposes an improvement scheme using incremental checkpoint technique.With this optimized checkpoint mechanism,only data that has been changed since the last checkpoint operation needs to be saved,so that the storage overhead and time overhead is greatly reduced,this scheme being verified with a matrix operation experiment.
出处
《河南农业大学学报》
CAS
CSCD
北大核心
2010年第6期718-721,共4页
Journal of Henan Agricultural University
基金
河南省科技攻关项目(2008A520011)
关键词
机群
CONDOR
高吞吐量计算
增量检查点技术
machine cluster
Condor
high throughput computing
incremental checkpoint technique