摘要
硬盘故障所致的数据丢失和损坏给企业和用户带来重大损失,硬盘故障预测也因此引起了学术界和企业界的高度重视,涌现了不少基于机器学习的故障预测方法,但由于存在机器学习算法模型的样本数据差异、性能指标不一致等原因,无法合理评估预测方法的优劣。鉴于此,建立了基于机器学习的硬盘故障检测评估平台,在统一的实验平台中对随机森林、逻辑回归、多层感知神经网络、决策树、朴素贝叶斯、极端梯度提升树、梯度提升决策树和AdaBoost算法模型进行故障预测性能比较,主要针对相同样本集和同一性能度量进行预测对比研究,还对同一预测模型在不同大小样本集上的预测效果进行了对比。实验结果表明:随机森林模型和梯度提升决策树模型不仅预测精度很高而且对不同规模的样本集具有很强的泛化性。
Data loss and damage caused by hard disk failure bring significant losses to enterprises and users.Therefore,hard disk failure prediction has also attracted the great attention of academic and enterprise.Many failure prediction methods based on machine learning have emerged.However,due to the different dataset and performance index,it is hard to evaluate the different algorithm models.Therefore,we establish a hard disk failure detection and evaluation platform for evaluating machine learning methods.The failure prediction performance of eight classical algorithm models are compared in a unified experimental platform,including random forest,logistic regression,multilayer perceptron-artificial neutral network,decision tree,naive Bayes,extreme gradient boosting,gradient boosting decision tree and AdaBoost.The experiments are executed on the same dataset with the same performance metric.Besides the prediction effects of the same prediction model on the datasets with different sizes are compared.The experimental results show that the random forest and gradient boosting decision tree can achieve high prediction accuracy as well as advantages of generalization for the datasets with different size.
作者
乔旭坤
李顺
李君
吴鑫
茅智慧
QIAO Xu-kun;LI Shun;LI Jun;WU Xin;MAO Zhi-hui(Zhejiang Wanli University,Ningbo 315100,China)
出处
《计算机技术与发展》
2022年第6期215-220,共6页
Computer Technology and Development
基金
国家级大学生创新创业训练计划项目(S202010876094,202010876035)
宁波市科技厅惠民项目(2017C50028)
浙江省大学生科技创新计划项目(2020R419023)。