Addressing classification and prediction challenges, tree ensemble models have gained significant importance. Boosting ensemble techniques are commonly employed for forecasting Type-II diabetes mellitus. Light Gradien...Addressing classification and prediction challenges, tree ensemble models have gained significant importance. Boosting ensemble techniques are commonly employed for forecasting Type-II diabetes mellitus. Light Gradient Boosting Machine (LightGBM) is a widely used algorithm known for its leaf growth strategy, loss reduction, and enhanced training precision. However, LightGBM is prone to overfitting. In contrast, CatBoost utilizes balanced base predictors known as decision tables, which mitigate overfitting risks and significantly improve testing time efficiency. CatBoost’s algorithm structure counteracts gradient boosting biases and incorporates an overfitting detector to stop training early. This study focuses on developing a hybrid model that combines LightGBM and CatBoost to minimize overfitting and improve accuracy by reducing variance. For the purpose of finding the best hyperparameters to use with the underlying learners, the Bayesian hyperparameter optimization method is used. By fine-tuning the regularization parameter values, the hybrid model effectively reduces variance (overfitting). Comparative evaluation against LightGBM, CatBoost, XGBoost, Decision Tree, Random Forest, AdaBoost, and GBM algorithms demonstrates that the hybrid model has the best F1-score (99.37%), recall (99.25%), and accuracy (99.37%). Consequently, the proposed framework holds promise for early diabetes prediction in the healthcare industry and exhibits potential applicability to other datasets sharing similarities with diabetes.展开更多
协同过滤(CF)算法基于物品之间或用户之间的相似度能实现个性化推荐,然而CF算法普遍存在数据稀疏性的问题。针对用户‒物品评分稀疏问题,为使预测更加准确,提出一种基于协同训练与Boosting的协同过滤算法(CFCTB)。首先,利用协同训练将两...协同过滤(CF)算法基于物品之间或用户之间的相似度能实现个性化推荐,然而CF算法普遍存在数据稀疏性的问题。针对用户‒物品评分稀疏问题,为使预测更加准确,提出一种基于协同训练与Boosting的协同过滤算法(CFCTB)。首先,利用协同训练将两种CF集成于一个框架,两种CF互相添加置信度高的伪标记样本到对方的训练集中,并利用Boosting加权训练数据辅助协同训练;其次,采用加权集成预测最终的用户评分,有效避免伪标记样本所产生的噪声累加,进一步提高推荐性能。实验结果表明,在4个公开数据集上,所提算法的准确率优于单模型;在稀疏度最高的CiaoDVD数据集上,与面向推荐系统的全局和局部核(GLocal-K)相比,所提算法的平均绝对误差(MAE)降低了4.737%;与ECoRec(Ensemble of Co-trained Recommenders)算法相比,所提算法的均方根误差(RMSE)降低了7.421%。以上结果验证了所提算法的有效性。展开更多
In this paper,an advanced and optimized Light Gradient Boosting Machine(LGBM)technique is proposed to identify the intrusive activities in the Internet of Things(IoT)network.The followings are the major contributions:...In this paper,an advanced and optimized Light Gradient Boosting Machine(LGBM)technique is proposed to identify the intrusive activities in the Internet of Things(IoT)network.The followings are the major contributions:i)An optimized LGBM model has been developed for the identification of malicious IoT activities in the IoT network;ii)An efficient evolutionary optimization approach has been adopted for finding the optimal set of hyper-parameters of LGBM for the projected problem.Here,a Genetic Algorithm(GA)with k-way tournament selection and uniform crossover operation is used for efficient exploration of hyper-parameter search space;iii)Finally,the performance of the proposed model is evaluated using state-of-the-art ensemble learning and machine learning-based model to achieve overall generalized performance and efficiency.Simulation outcomes reveal that the proposed approach is superior to other considered methods and proves to be a robust approach to intrusion detection in an IoT environment.展开更多
Accurate prediction ofmonthly oil and gas production is essential for oil enterprises tomake reasonable production plans,avoid blind investment and realize sustainable development.Traditional oil well production trend...Accurate prediction ofmonthly oil and gas production is essential for oil enterprises tomake reasonable production plans,avoid blind investment and realize sustainable development.Traditional oil well production trend prediction methods are based on years of oil field production experience and expertise,and the application conditions are very demanding.With the rapid development of artificial intelligence technology,big data analysis methods are gradually applied in various sub-fields of the oil and gas reservoir development.Based on the data-driven artificial intelligence algorithmGradient BoostingDecision Tree(GBDT),this paper predicts the initial single-layer production by considering geological data,fluid PVT data and well data.The results show that the GBDT algorithm prediction model has great accuracy,significantly improving efficiency and strong universal applicability.The GBDTmethod trained in this paper can predict production,which is helpful for well site optimization,perforation layer optimization and engineering parameter optimization and has guiding significance for oilfield development.展开更多
Fraud detection for credit/debit card,loan defaulters and similar types is achievable with the assistance of Machine Learning(ML)algorithms as they are well capable of learning from previous fraud trends or historical...Fraud detection for credit/debit card,loan defaulters and similar types is achievable with the assistance of Machine Learning(ML)algorithms as they are well capable of learning from previous fraud trends or historical data and spot them in current or future transactions.Fraudulent cases are scant in the comparison of non-fraudulent observations,almost in all the datasets.In such cases detecting fraudulent transaction are quite difficult.The most effective way to prevent loan default is to identify non-performing loans as soon as possible.Machine learning algorithms are coming into sight as adept at handling such data with enough computing influence.In this paper,the rendering of different machine learning algorithms such as Decision Tree,Random Forest,linear regression,and Gradient Boosting method are compared for detection and prediction of fraud cases using loan fraudulent manifestations.Further model accuracy metric have been performed with confusion matrix and calculation of accuracy,precision,recall and F-1 score along with Receiver Operating Characteristic(ROC)curves.展开更多
The Sentinel-2 satellites are providing an unparalleled wealth of high-resolution remotely sensed information with a short revisit cycle, which is ideal for mapping burned areas both accurately and timely. This paper ...The Sentinel-2 satellites are providing an unparalleled wealth of high-resolution remotely sensed information with a short revisit cycle, which is ideal for mapping burned areas both accurately and timely. This paper proposes an automated methodology for mapping burn scars using pairs of Sentinel-2 imagery, exploiting the state-of-the-art eXtreme Gradient Boosting (XGB) machine learning framework. A large database of 64 reference wildfire perimeters in Greece from 2016 to 2019 is used to train the classifier. An empirical methodology for appropriately sampling the training patterns from this database is formulated, which guarantees the effectiveness of the approach and its computational efficiency. A difference (pre-fire minus post-fire) spectral index is used for this purpose, upon which we appropriately identify the clear and fuzzy value ranges. To reduce the data volume, a super-pixel segmentation of the images is also employed, implemented via the QuickShift algorithm. The cross-validation results showcase the effectiveness of the proposed algorithm, with the average commission and omission errors being 9% and 2%, respectively, and the average Matthews correlation coefficient (MCC) equal to 0.93.展开更多
While setting the direction for China’s economic development in 2023,the Central Economic Work Conference will generate a far-reaching impact on the world economy as a whole.AT the Central Economic Work Conference he...While setting the direction for China’s economic development in 2023,the Central Economic Work Conference will generate a far-reaching impact on the world economy as a whole.AT the Central Economic Work Conference held on December 15-16,2022,President Xi Jinping delivered a speech that recapped the Chinese economy in 2022 and charted its course for 2023.展开更多
针对代价敏感学习问题,研究boosting算法的代价敏感扩展。提出一种基于代价敏感采样的代价敏感boosting学习方法,通过在原始boosting每轮迭代中引入代价敏感采样,最小化代价敏感损失期望。基于上述学习框架,推导出两种代价敏感boosting...针对代价敏感学习问题,研究boosting算法的代价敏感扩展。提出一种基于代价敏感采样的代价敏感boosting学习方法,通过在原始boosting每轮迭代中引入代价敏感采样,最小化代价敏感损失期望。基于上述学习框架,推导出两种代价敏感boosting算法,同时,揭示并解释已有算法的不稳定本质。在加州大学欧文分校(University of California,Irvine,UCI)数据集和麻省理工学院生物和计算学习中心(Center for Biological&Computational Learning,CBCL)人脸数据集上的实验结果表明,对于代价敏感分类问题,代价敏感采样boosting算法优于原始boosting和已有代价敏感boosting算法。展开更多
Boosting is one of the most representational ensemble prediction methods. It can be divided into two se-ries: Boost-by-majority and Adaboost. This paper briefly introduces the research status of Boosting and one of it...Boosting is one of the most representational ensemble prediction methods. It can be divided into two se-ries: Boost-by-majority and Adaboost. This paper briefly introduces the research status of Boosting and one of its seri-als-AdaBoost,analyzes the typical algorithms of AdaBoost.展开更多
Cardiovascular disease is among the top five fatal diseases that affect lives worldwide.Therefore,its early prediction and detection are crucial,allowing one to take proper and necessary measures at earlier stages.Mac...Cardiovascular disease is among the top five fatal diseases that affect lives worldwide.Therefore,its early prediction and detection are crucial,allowing one to take proper and necessary measures at earlier stages.Machine learning(ML)techniques are used to assist healthcare providers in better diagnosing heart disease.This study employed three boosting algorithms,namely,gradient boost,XGBoost,and AdaBoost,to predict heart disease.The dataset contained heart disease-related clinical features and was sourced from the publicly available UCI ML repository.Exploratory data analysis is performed to find the characteristics of data samples about descriptive and inferential statistics.Specifically,it was carried out to identify and replace outliers using the interquartile range and detect and replace the missing values using the imputation method.Results were recorded before and after the data preprocessing techniques were applied.Out of all the algorithms,gradient boosting achieved the highest accuracy rate of 92.20%for the proposed model.The proposed model yielded better results with gradient boosting in terms of precision,recall,and f1-score.It attained better prediction performance than the existing works and can be used for other diseases that share common features using transfer learning.展开更多
文摘Addressing classification and prediction challenges, tree ensemble models have gained significant importance. Boosting ensemble techniques are commonly employed for forecasting Type-II diabetes mellitus. Light Gradient Boosting Machine (LightGBM) is a widely used algorithm known for its leaf growth strategy, loss reduction, and enhanced training precision. However, LightGBM is prone to overfitting. In contrast, CatBoost utilizes balanced base predictors known as decision tables, which mitigate overfitting risks and significantly improve testing time efficiency. CatBoost’s algorithm structure counteracts gradient boosting biases and incorporates an overfitting detector to stop training early. This study focuses on developing a hybrid model that combines LightGBM and CatBoost to minimize overfitting and improve accuracy by reducing variance. For the purpose of finding the best hyperparameters to use with the underlying learners, the Bayesian hyperparameter optimization method is used. By fine-tuning the regularization parameter values, the hybrid model effectively reduces variance (overfitting). Comparative evaluation against LightGBM, CatBoost, XGBoost, Decision Tree, Random Forest, AdaBoost, and GBM algorithms demonstrates that the hybrid model has the best F1-score (99.37%), recall (99.25%), and accuracy (99.37%). Consequently, the proposed framework holds promise for early diabetes prediction in the healthcare industry and exhibits potential applicability to other datasets sharing similarities with diabetes.
文摘协同过滤(CF)算法基于物品之间或用户之间的相似度能实现个性化推荐,然而CF算法普遍存在数据稀疏性的问题。针对用户‒物品评分稀疏问题,为使预测更加准确,提出一种基于协同训练与Boosting的协同过滤算法(CFCTB)。首先,利用协同训练将两种CF集成于一个框架,两种CF互相添加置信度高的伪标记样本到对方的训练集中,并利用Boosting加权训练数据辅助协同训练;其次,采用加权集成预测最终的用户评分,有效避免伪标记样本所产生的噪声累加,进一步提高推荐性能。实验结果表明,在4个公开数据集上,所提算法的准确率优于单模型;在稀疏度最高的CiaoDVD数据集上,与面向推荐系统的全局和局部核(GLocal-K)相比,所提算法的平均绝对误差(MAE)降低了4.737%;与ECoRec(Ensemble of Co-trained Recommenders)算法相比,所提算法的均方根误差(RMSE)降低了7.421%。以上结果验证了所提算法的有效性。
文摘In this paper,an advanced and optimized Light Gradient Boosting Machine(LGBM)technique is proposed to identify the intrusive activities in the Internet of Things(IoT)network.The followings are the major contributions:i)An optimized LGBM model has been developed for the identification of malicious IoT activities in the IoT network;ii)An efficient evolutionary optimization approach has been adopted for finding the optimal set of hyper-parameters of LGBM for the projected problem.Here,a Genetic Algorithm(GA)with k-way tournament selection and uniform crossover operation is used for efficient exploration of hyper-parameter search space;iii)Finally,the performance of the proposed model is evaluated using state-of-the-art ensemble learning and machine learning-based model to achieve overall generalized performance and efficiency.Simulation outcomes reveal that the proposed approach is superior to other considered methods and proves to be a robust approach to intrusion detection in an IoT environment.
文摘Accurate prediction ofmonthly oil and gas production is essential for oil enterprises tomake reasonable production plans,avoid blind investment and realize sustainable development.Traditional oil well production trend prediction methods are based on years of oil field production experience and expertise,and the application conditions are very demanding.With the rapid development of artificial intelligence technology,big data analysis methods are gradually applied in various sub-fields of the oil and gas reservoir development.Based on the data-driven artificial intelligence algorithmGradient BoostingDecision Tree(GBDT),this paper predicts the initial single-layer production by considering geological data,fluid PVT data and well data.The results show that the GBDT algorithm prediction model has great accuracy,significantly improving efficiency and strong universal applicability.The GBDTmethod trained in this paper can predict production,which is helpful for well site optimization,perforation layer optimization and engineering parameter optimization and has guiding significance for oilfield development.
文摘Fraud detection for credit/debit card,loan defaulters and similar types is achievable with the assistance of Machine Learning(ML)algorithms as they are well capable of learning from previous fraud trends or historical data and spot them in current or future transactions.Fraudulent cases are scant in the comparison of non-fraudulent observations,almost in all the datasets.In such cases detecting fraudulent transaction are quite difficult.The most effective way to prevent loan default is to identify non-performing loans as soon as possible.Machine learning algorithms are coming into sight as adept at handling such data with enough computing influence.In this paper,the rendering of different machine learning algorithms such as Decision Tree,Random Forest,linear regression,and Gradient Boosting method are compared for detection and prediction of fraud cases using loan fraudulent manifestations.Further model accuracy metric have been performed with confusion matrix and calculation of accuracy,precision,recall and F-1 score along with Receiver Operating Characteristic(ROC)curves.
文摘The Sentinel-2 satellites are providing an unparalleled wealth of high-resolution remotely sensed information with a short revisit cycle, which is ideal for mapping burned areas both accurately and timely. This paper proposes an automated methodology for mapping burn scars using pairs of Sentinel-2 imagery, exploiting the state-of-the-art eXtreme Gradient Boosting (XGB) machine learning framework. A large database of 64 reference wildfire perimeters in Greece from 2016 to 2019 is used to train the classifier. An empirical methodology for appropriately sampling the training patterns from this database is formulated, which guarantees the effectiveness of the approach and its computational efficiency. A difference (pre-fire minus post-fire) spectral index is used for this purpose, upon which we appropriately identify the clear and fuzzy value ranges. To reduce the data volume, a super-pixel segmentation of the images is also employed, implemented via the QuickShift algorithm. The cross-validation results showcase the effectiveness of the proposed algorithm, with the average commission and omission errors being 9% and 2%, respectively, and the average Matthews correlation coefficient (MCC) equal to 0.93.
文摘While setting the direction for China’s economic development in 2023,the Central Economic Work Conference will generate a far-reaching impact on the world economy as a whole.AT the Central Economic Work Conference held on December 15-16,2022,President Xi Jinping delivered a speech that recapped the Chinese economy in 2022 and charted its course for 2023.
文摘针对代价敏感学习问题,研究boosting算法的代价敏感扩展。提出一种基于代价敏感采样的代价敏感boosting学习方法,通过在原始boosting每轮迭代中引入代价敏感采样,最小化代价敏感损失期望。基于上述学习框架,推导出两种代价敏感boosting算法,同时,揭示并解释已有算法的不稳定本质。在加州大学欧文分校(University of California,Irvine,UCI)数据集和麻省理工学院生物和计算学习中心(Center for Biological&Computational Learning,CBCL)人脸数据集上的实验结果表明,对于代价敏感分类问题,代价敏感采样boosting算法优于原始boosting和已有代价敏感boosting算法。
文摘Boosting is one of the most representational ensemble prediction methods. It can be divided into two se-ries: Boost-by-majority and Adaboost. This paper briefly introduces the research status of Boosting and one of its seri-als-AdaBoost,analyzes the typical algorithms of AdaBoost.
基金This work was supported by National Research Foundation of Korea-Grant funded by the Korean Government(MSIT)-NRF-2020R1A2B5B02002478.
文摘Cardiovascular disease is among the top five fatal diseases that affect lives worldwide.Therefore,its early prediction and detection are crucial,allowing one to take proper and necessary measures at earlier stages.Machine learning(ML)techniques are used to assist healthcare providers in better diagnosing heart disease.This study employed three boosting algorithms,namely,gradient boost,XGBoost,and AdaBoost,to predict heart disease.The dataset contained heart disease-related clinical features and was sourced from the publicly available UCI ML repository.Exploratory data analysis is performed to find the characteristics of data samples about descriptive and inferential statistics.Specifically,it was carried out to identify and replace outliers using the interquartile range and detect and replace the missing values using the imputation method.Results were recorded before and after the data preprocessing techniques were applied.Out of all the algorithms,gradient boosting achieved the highest accuracy rate of 92.20%for the proposed model.The proposed model yielded better results with gradient boosting in terms of precision,recall,and f1-score.It attained better prediction performance than the existing works and can be used for other diseases that share common features using transfer learning.