摘要
Driven by the need of a plethora of machine learning applications,several attempts have been made at improving the performance of classifiers applied to imbalanced datasets.In this paper,we present a fast maximum entropy machine(MEM)combined with a synthetic minority over-sampling technique for handling binary classification problems with high imbalance ratios,large numbers of data samples,and medium/large numbers of features.A random Fourier feature representation of kernel functions and primal estimated sub-gradient solver for support vector machine(PEGASOS)are applied to speed up the classic MEM.Experiments have been conducted using various real datasets(including two China Mobile datasets and several other standard test datasets)with various configurations.The obtained results demonstrate that the proposed algorithm has extremely low complexity but an excellent overall classification performance(in terms of several widely used evaluation metrics)as compared to the classic MEM and some other state-of-the-art methods.The proposed algorithm is particularly valuable in big data applications owing to its significantly low computational complexity.
基金
The author Feng Yin was funded by the Shenzhen Science and Technology Innovation Council(No.JCYJ20170307155957688)and by National Natural Science Foundation of China Key Project(No.61731018)
The authors Feng Yin and Shuguang(Robert)Cui were funded by Shenzhen Fundamental Research Funds under Grant(Key Lab)No.ZDSYS201707251409055,Grant(Peacock)No.KQTD2015033114415450,and Guangdong province“The Pearl River Talent Recruitment Program Innovative and Entrepreneurial Teams in 2017”-Data Driven Evolution of Future Intelligent Network Team.The associate editor coordinating the review of this paper and approving it for publication was X.Cheng.