摘要
及时发现网格系统的故障,是提高网格系统运行可靠性的关键.为此,在实现网格资源监控与网格任务监控的基础上,设计并实现了网格故障检测与告警系统.描述了故障检测与告警系统的整体框架架构及实现机制.本系统采用主动式故障检测,主动轮流咨询分布在网格节点上的代理Agent和分布在服务器上的任务监控代理Agent,基于自适应条件分别判断网格资源和网格任务是否运行正常,当出现故障或错误时及时发出告警,并显示告警信息.
It is vital to find fault in time for grid running reliably.Basised on implemetation of grid resources monitoring and grid task monitoring.We design and realize fault detection and notification.This paper puts forward the framework of fault detection and notification,the realization mechanism of fault detection and notification.The system adopts active fault detection,forwardly poll the agent which locates in each grid node and tasks monitoring agent which locates in server.It judges whether grid resources and grid task are running normal according to self-adaptive condition and mode.When faults occur in grid resources and grid task,notification must be sent to administrator and display notification messages.
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2006年第z1期123-125,共3页
Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金
国家重点基础研究发展计划资助项目(2003CB314805)
国家高技术研究发展计划资助项目(2005AA121560)
关键词
网格资源故障检测
网格任务故障检测
告警
grid resources fault detection
grid task fault detection
notification