期刊文献+

A Classifier Using Online Bagging Ensemble Method for Big Data Stream Learning 被引量:6

A Classifier Using Online Bagging Ensemble Method for Big Data Stream Learning
原文传递
导出
摘要 By combining multiple weak learners with concept drift in the classification of big data stream learning, the ensemble learning can achieve better generalization performance than the single learning approach. In this paper,we present an efficient classifier using the online bagging ensemble method for big data stream learning. In this classifier, we introduce an efficient online resampling mechanism on the training instances, and use a robust coding method based on error-correcting output codes. This is done in order to reduce the effects of correlations between the classifiers and increase the diversity of the ensemble. A dynamic updating model based on classification performance is adopted to reduce the unnecessary updating operations and improve the efficiency of learning.We implement a parallel version of EoBag, which runs faster than the serial version, and results indicate that the classification performance is almost the same as the serial one. Finally, we compare the performance of classification and the usage of resources with other state-of-the-art algorithms using the artificial and the actual data sets, respectively. Results show that the proposed algorithm can obtain better accuracy and more feasible usage of resources for the classification of big data stream. By combining multiple weak learners with concept drift in the classification of big data stream learning, the ensemble learning can achieve better generalization performance than the single learning approach. In this paper,we present an efficient classifier using the online bagging ensemble method for big data stream learning. In this classifier, we introduce an efficient online resampling mechanism on the training instances, and use a robust coding method based on error-correcting output codes. This is done in order to reduce the effects of correlations between the classifiers and increase the diversity of the ensemble. A dynamic updating model based on classification performance is adopted to reduce the unnecessary updating operations and improve the efficiency of learning.We implement a parallel version of EoBag, which runs faster than the serial version, and results indicate that the classification performance is almost the same as the serial one. Finally, we compare the performance of classification and the usage of resources with other state-of-the-art algorithms using the artificial and the actual data sets, respectively. Results show that the proposed algorithm can obtain better accuracy and more feasible usage of resources for the classification of big data stream.
出处 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2019年第4期379-388,共10页 清华大学学报(自然科学版(英文版)
基金 supported in part by the National Natural Science Foundation of China(Nos.61702089,61876205,and 61501102) the Science and Technology Plan Project of Guangzhou(No.201804010433) the Bidding Project of Laboratory of Language Engineering and Computing(No.LEC2017ZBKT001)
关键词 big data STREAM classification ONLINE BAGGING ensemble LEARNING concept DRIFT big data stream classification online bagging ensemble learning concept drift
  • 相关文献

同被引文献27

引证文献6

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部