摘要
气象云已经成为省级气象业务的重要运行环境,传统的集群监视技术既无法对虚拟机和云应用进行监视,也无法对发生的故障进行告警及自动化处理,当前江西省气象云监控运维面临着巨大挑战.本文基于Zabbix开源系统设计了一种气象云监控运维系统.该系统能够监视物理基础设施层、虚拟化层、应用层.该系统能够将气象云故障通过告警的方式推送给值班人员,并且会自动对常见的故障场景进行应急恢复处理.经过部署测试,该系统运行稳定,极大提高了值班人员的运维效率.
Meteorological cloud has become an important runtime environment of provincial meteorological systems.There is a big challenge to monitoring and maintaining the Jiangxi provincial cloud environment,because tranditional monitoring technology for server clusters cannot monitor virtual machines and cloud applications on the hand and fails to warn and automatically handle the failure.A monitoring and maintenance system for meteorological cloud based on Zabbix has been designed.It can monitor the layers of physical infrastructure,virutalization and application.Moreover,it can send the warning of failures in meteorological coud to the staff on duty and execute emergency recovery orders automatically in common failure scenarios.Through deployment and tests,it runs stably,markedly improving the operation and maintenance efficiency of the staff on duty.
作者
杨立苑
胡佳军
邓卫华
刘喆玥
YANG Li-Yuan;HU Jia-Jun;DENG Wei-Hua;LIU Zhe-Yue(Meteorological Information Center of Jiangxi Province,Nanchang 330096,China)
出处
《计算机系统应用》
2021年第8期73-80,共8页
Computer Systems & Applications
基金
江西省气象局重点科研项目(QX2019Z03,QX2017Z01)。
关键词
气象云
监控运维
故障处理
meteorological cloud
monitoring and maintenance
failure handling