摘要
针对计算流体力学应用开发框架容错支持能力的不足,提出了一种新的容错周期优化方法。该方法基于系统故障的概率建模,计算得到理想最优容错周期;并结合计算流体力学应用场数据输出的特点,在线确定实际检查点备份时机。三个典型应用的实验结果表明,在不同平均无故障时间的系统上,与固定时间步进行容错的方法相比,该方法总能够得到最优的容错开销。用户可以基于该方法通过框架接口便捷地设置容错周期,并有效降低容错所引起的开销。
For the fault-tolerance shortage of CFD ( Computational Fluid Dynamics)-oriented application development framework, a new fault-tolerance period optimization method was proposed. The method computed the ideal best fault-tolerance period based on the probability model of system's faults, and online determined the occasion of real check points with the consideration of CFD fields output characteristic. The experimental results of three applications show that on the systems with different mean time between faults, compared with the fault-tolerance method based on performing fault-tolerance between fixed steps, the proposed method can always get the best fauh-toleranee overheads. Based on this method, user can set the fault- tolerance period with framework interfaces conveniently and reduce the fault-tolerance overheads.
出处
《计算机应用》
CSCD
北大核心
2014年第2期382-386,共5页
journal of Computer Applications
基金
国家自然科学基金资助项目(61120106005
61303071)
广州市科信局基金资助项目(134200026)
关键词
容错
周期优化
检查点
计算流体力学
开发框架
fault-tolerance
period optimization
check point
Computational Fluid Dynamics ( CFD)
developmentframework