摘要
Every application in a smart city environment like the smart grid,health monitoring, security, and surveillance generates non-stationary datastreams. Due to such nature, the statistical properties of data changes overtime, leading to class imbalance and concept drift issues. Both these issuescause model performance degradation. Most of the current work has beenfocused on developing an ensemble strategy by training a new classifier on thelatest data to resolve the issue. These techniques suffer while training the newclassifier if the data is imbalanced. Also, the class imbalance ratio may changegreatly from one input stream to another, making the problem more complex.The existing solutions proposed for addressing the combined issue of classimbalance and concept drift are lacking in understating of correlation of oneproblem with the other. This work studies the association between conceptdrift and class imbalance ratio and then demonstrates how changes in classimbalance ratio along with concept drift affect the classifier’s performance.We analyzed the effect of both the issues on minority and majority classesindividually. To do this, we conducted experiments on benchmark datasetsusing state-of-the-art classifiers especially designed for data stream classification.Precision, recall, F1 score, and geometric mean were used to measure theperformance. Our findings show that when both class imbalance and conceptdrift problems occur together the performance can decrease up to 15%. Ourresults also show that the increase in the imbalance ratio can cause a 10% to15% decrease in the precision scores of both minority and majority classes.The study findings may help in designing intelligent and adaptive solutionsthat can cope with the challenges of non-stationary data streams like conceptdrift and class imbalance.
基金
The authors would like to extend their gratitude to Universiti Teknologi PETRONAS (Malaysia)for funding this research through grant number (015LA0-037).