一种分布式并行服务器节点故障检测算法被引量：3

A Node Fault Detection Algorithm in Distributed Parallel Server

下载PDF

导出

摘要故障检测技术是实现分布式并行服务器容错的基础。为了尽可能减小故障检测算法对系统通信开销的增加,寄生式自适应故障检测算法被提了出来。该算法依靠系统内部固有的信息交换进行故障检测,而通过自回归AR模型预测消息的传输时间和处理时间,并以此自动调整故障检测的阈值,达到自适应系统运行状况的目的,最后通过伪代码的形式描述了该算法的实现。该算法已被应用于分布式并行数据库系统DPSQL,较好地实现了节点故障检测。 Fault detection technology is the base of fault-tolerance in distributed parallel server. To reduce the communication cost that fault detection algorithm brings to the greatest extent, Autoecious Adaptive Fault Detection （A2FD） algorithm is proposed. Depending on the exchange of inherent information processed by system, the algorithm carries out fault detection. To achieve the goal that it adapts itself to system status, the algorithm adjusts the key value of fault detection according to transmission and transaction time of the message predicted by Auto Regression （AR） model. Finally, the realization of the algorithm is described with the pseudocode. The algorithm has been applied to distributed parallel database system--DPSQL, which has realized node fault diction well.

作者左朝树刘心松邱元杰陈小辉李可

机构地区中国电子科技集团公司第电子科技大学计算机科学与工程学院

出处《电子科技大学学报》 EI CAS CSCD 北大核心 2007年第1期119-121,125,共4页 Journal of University of Electronic Science and Technology of China

关键词分布式并行服务器故障检测自适应寄生 distributed parallel server fault detection adaptive autoeciousness

分类号 TP302.8 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献3

1赵立杰,王纲.输入训练神经网络PCA故障检测方法[J].系统仿真学报,2001,13(z1):149-151. 被引量：4
2李胜利,张前峰,韩宗芬,庞丽萍.DRT-UNIX的故障检测技术[J].华中理工大学学报,2000,28(12):42-44. 被引量：2
3赵超,张君昌.控制系统故障检测与多模型混合估计方法[J].系统工程与电子技术,2001,23(7):63-65. 被引量：7

二级参考文献11

1胡昌华,李学锋,陈新海,许化龙.一种新的基于模型和参数估计的过程故障诊断[J].西北工业大学学报,1995,13(1):61-64. 被引量：4
2[1]Dong D, Thomas J. Batch tracking via nonlinear principal component analysis [J]. AICHE J., 1996, 42(8): 2199-2208.
3[2]Qin S J, McAvoy T J. Nonlinear PLS modeling using neural networks [J]. Computer Chem Eng, 1992, 16(4): 379-391.
4[3]Dong D, McAvoy T J. Nonlinear principal component analysis--Based on principal curves and neural network [J]. Computer Chem Eng, 1996, 20(1): 65-78.
5[4]Tan S, Mavrovouniotis M L. Reducing data dimensionality through optimizing neural network inputs [J]. AICHE J, 1995, 41(6): 1471- 1480.
6[5]Nomikos P, Macgregor J. Monitoring batch processes using multiway principal component analysis [J]. AICHE J, 1994, 40(8): 1361-1375.
7Li X R，IEEE Trans AC，1996年，41卷，4期，478页
8韩宗芬,秦啸,庞丽萍,李胜利.分布式系统的实时容错任务调度算法设计[J].华中理工大学学报,1999,27(4):12-14. 被引量：12
9李胜利,骆雁,许青林,韩宗芬.一种有保证实时通信协议的设计[J].华中理工大学学报,1999,27(4):15-17. 被引量：3
10张君昌,李言俊,周自全,吕杰.基于交互多模估计策略的故障检诊方法[J].飞行力学,1999,17(3):81-85. 被引量：1

共引文献10

1段琢华,蔡自兴,于金霞.未知环境中移动机器人故障诊断与容错控制技术综述[J].机器人,2005,27(4):373-379. 被引量：16
2贾明兴,牛大鹏,王福利,赵春晖.基于RBF神经网络的非线性主元分析新方法[J].系统仿真学报,2007,19(24):5684-5687. 被引量：4
3段琢华,蔡自兴,于金霞.不完备多模型混合系统故障诊断的粒子滤波算法[J].自动化学报,2008,34(5):581-587. 被引量：8
4Guo Yuying,Jiang Bin,Zhang Youmin,Wang dianfei.Novel robust fault diagnosis method for flight control systems[J].Journal of Systems Engineering and Electronics,2008,19(5):1017-1023. 被引量：10
5陈涛,张冬,严浙平,边信黔.基于模型融合的自主潜器推进系统故障诊断[J].中国造船,2009,50(2):145-151. 被引量：4
6郑志强,袁海文.基于模糊多模型结构的飞行控制系统执行器故障诊断[J].兵工学报,2010,31(3):380-384. 被引量：2
7余伶俐,蔡自兴,周智,奉振球.Fault detection and identification for dead reckoning system of mobile robot based on fuzzy logic particle filter[J].Journal of Central South University,2012,19(5):1249-1257. 被引量：4
8李挺,邓科,蔡昂,张焕青,王汇,冯凝,李炜.类Unix操作系统服务器离线硬件检测软件的设计与实现[J].沈阳师范大学学报（自然科学版）,2018,36(5):463-467. 被引量：1
9于树本.基于MFCC的说话人语音识别系统的研究[J].黑龙江科技信息,2015(27):69-70. 被引量：3
10王刚,冯贵玉,胡德文.一种基于信号-噪声模型的主元数目选择方法[J].计算机工程与应用,2003,39(6):11-12. 被引量：1

同被引文献38

1汤小康.服务器虚拟化技术在校园网中的应用[J].计算机时代,2009(2):14-15. 被引量：29
2林闯,彭雪海.可信网络研究[J].计算机学报,2005,28(5):751-758. 被引量：253
3杨少春.采用VMware构建虚拟并行计算网[J].计算机工程与设计,2006,27(14):2546-2547. 被引量：20
4Patterson D.Recovery oriented computing.Presented at Princeton University[EB/OL].2002,http://roc.cs.berkeley.edu /talks/UIUC.ppt.
5Yamanouchi M,Matsuura S,and Sunahara H.A fault detection system for large scale sensor networks considering reliability of sensor data[C].Proc of the Ninth Annual International Symposium on Applications and Internet (SAINT'09).Seattl,USA,2009:255-258.
6Lee H M,Park D S,and Hong M,et al..A resource management system for fault tolerance in grid computing[C].Proc of International Conference on Computational Science and Engineering (CSE'09).Vancouver,CA,2009,2:609-614.
7Chtepen M,Claeys F,and Dhoedt B,et al..Adaptive task checkpointing and replication:toward efficient fault-tolerant grids[J].IEEE Transactions on Parallel and Distributed Systems,2009,20(2):180-190.
8Jain A and Shyamasundar R K.Failure detection and membership in grid environments[C].Proc of the 5th IEEE/ACM Int'l Workshop on Grid Computing (GRID'04),Los Alamitos,CA,IEEE Computer Society Press,2004:44-52.
9Hwang S and Kesselmanl C.A flexible framework for fault tolerance in the grid[J].Journal of Grid Computing,2003,1(3):251-272.
10Chen W,Toueg S,and Aguilera1 M K.On the quality of service of failure detectors[J].IEEE Transactions on Computers,2002,51(2):13-32.

引证文献3

1常光辉,陈蜀宇,徐光侠,卢华玮.一种高效可扩展的自组织邻域故障检测协议[J].电子与信息学报,2010,32(9):2145-2150.
2张新林,潘日明,黄荣光.虚拟服务器在断电时的自我保护方法研究[J].计算机与现代化,2012(8):201-205.
3程莹,邵清.云环境下服务器故障自适应诊断算法研究[J].软件导刊,2018,17(9):72-76. 被引量：1

二级引证文献1

1鲁长海.一次自动气象站串口服务器故障处理过程的探讨[J].科技成果纵横,2020,29(2):57-57.

1左朝树,刘心松,邱元杰,郝尧,朱相文.分布式并行服务器的动态重构容错算法[J].系统工程与电子技术,2005,27(5):910-913. 被引量：3
2左朝树,刘心松,陈小辉,刘丹.分布式并行服务器内部通信的近似最佳线程数[J].哈尔滨工程大学学报,2005,26(5):614-618. 被引量：1
3陈宁.一种分布式并行服务器模型的性能分析与改进[J].计算机系统应用,2010,19(2):119-122. 被引量：1
4陈建英,左朝树,王莉.分布式并行数据库系统中节点信息的动态管理和维护[J].计算机应用,2005,25(9):2002-2003. 被引量：1
5杨峰,刘心松,左朝树,唐续.分布式并行服务器透明性及任务调度研究[J].计算机研究与发展,2003,40(9):1319-1325. 被引量：12
6GE,Fanuc,彭彦.服务器容错比较[J].工业设计,2009(8):48-49.
7周世杰,陈自力,潘燕燕.校园网内WEB服务器容错及负载平衡的实现[J].科技信息,2007(30):81-82.
8顾攀,刘心松,陈小辉,邱元杰,左朝树.分布式并行数据库系统中任务分配算法的设计[J].电子科技大学学报,2006,35(6):946-949.
9李景林.网格环境下的故障检测服务研究[J].计算机应用与软件,2010,27(6):120-122. 被引量：3
10何敏,赵东风,刘心松.无线分布式并行服务器网络服务模型研究[J].电子测量与仪器学报,2007,21(1):29-33. 被引量：2

电子科技大学学报

2007年第1期

浏览历史

内容加载中请稍等...

一种分布式并行服务器节点故障检测算法被引量：3

参考文献3

二级参考文献11

共引文献10

同被引文献38

引证文献3

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

一种分布式并行服务器节点故障检测算法 被引量：3

参考文献3

二级参考文献11

共引文献10

同被引文献38

引证文献3

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

一种分布式并行服务器节点故障检测算法被引量：3