期刊文献+
共找到126篇文章
< 1 2 7 >
每页显示 20 50 100
Classification of aviation incident causes using LGBM with improved cross-validation
1
作者 NI Xiaomei WANG Huawei +1 位作者 CHEN Lingzi LIN Ruiguan 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第2期396-405,共10页
Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced mach... Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced machine learning algorithm.To assess aviation safety and identify the causes of incidents, a classification model with light gradient boosting machine (LGBM)based on the aviation safety reporting system (ASRS) has been developed. It is improved by k-fold cross-validation with hybrid sampling model (HSCV), which may boost classification performance and maintain data balance. The results show that employing the LGBM-HSCV model can significantly improve accuracy while alleviating data imbalance. Vertical comparison with other cross-validation (CV) methods and lateral comparison with different fold times comprise the comparative approach. Aside from the comparison, two further CV approaches based on the improved method in this study are discussed:one with a different sampling and folding order, and the other with more CV. According to the assessment indices with different methods, the LGBMHSCV model proposed here is effective at detecting incident causes. The improved model for imbalanced data categorization proposed may serve as a point of reference for similar data processing, and the model’s accurate identification of civil aviation incident causes can assist to improve civil aviation safety. 展开更多
关键词 aviation safety imbalance data light gradient boosting machine(LGBM) cross-validation(CV)
下载PDF
基于Cross-Validation的小波自适应去噪方法 被引量:4
2
作者 黄文清 戴瑜兴 李加升 《湖南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2008年第11期40-43,共4页
小波去噪算法中,阈值的选择非常关键.提出一种自适应阈值选择算法.该算法先通过Cross-Validation方法将噪声干扰信号分成两个子信号,一个用于阈值处理,一个用作参考信号;再采用最深梯度法来寻求一个最优去噪阈值.仿真和实验结果表明:在... 小波去噪算法中,阈值的选择非常关键.提出一种自适应阈值选择算法.该算法先通过Cross-Validation方法将噪声干扰信号分成两个子信号,一个用于阈值处理,一个用作参考信号;再采用最深梯度法来寻求一个最优去噪阈值.仿真和实验结果表明:在均方误差意义上,所提算法去噪效果优于Donoho等提出的VisuShrink和SureShrink两种去噪算法,且不需要带噪信号的任何'先验信息',适应于实际信号去噪处理. 展开更多
关键词 小波变换 cross-validation 自适应滤波 阈值
下载PDF
Cross-Validation, Shrinkage and Variable Selection in Linear Regression Revisited 被引量:3
3
作者 Hans C. van Houwelingen Willi Sauerbrei 《Open Journal of Statistics》 2013年第2期79-102,共24页
In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues.... In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues. In order to assess and compare several strategies, we will conduct a simulation study with 15 predictors and a complex correlation structure in the linear regression model. Using sample sizes of 100 and 400 and estimates of the residual variance corresponding to R2 of 0.50 and 0.71, we consider 4 scenarios with varying amount of information. We also consider two examples with 24 and 13 predictors, respectively. We will discuss the value of cross-validation, shrinkage and backward elimination (BE) with varying significance level. We will assess whether 2-step approaches using global or parameterwise shrinkage (PWSF) can improve selected models and will compare results to models derived with the LASSO procedure. Beside of MSE we will use model sparsity and further criteria for model assessment. The amount of information in the data has an influence on the selected models and the comparison of the procedures. None of the approaches was best in all scenarios. The performance of backward elimination with a suitably chosen significance level was not worse compared to the LASSO and BE models selected were much sparser, an important advantage for interpretation and transportability. Compared to global shrinkage, PWSF had better performance. Provided that the amount of information is not too small, we conclude that BE followed by PWSF is a suitable approach when variable selection is a key part of data analysis. 展开更多
关键词 cross-validation LASSO SHRINKAGE SIMULATION STUDY VARIABLE SELECTION
下载PDF
ON THE CONSISTENCY OF CROSS-VALIDATIONIN NONLINEAR WAVELET REGRESSION ESTIMATION
4
作者 张双林 郑忠国 《Acta Mathematica Scientia》 SCIE CSCD 2000年第1期1-11,共11页
For the nonparametric regression model Y-ni = g(x(ni)) + epsilon(ni)i = 1, ..., n, with regularly spaced nonrandom design, the authors study the behavior of the nonlinear wavelet estimator of g(x). When the threshold ... For the nonparametric regression model Y-ni = g(x(ni)) + epsilon(ni)i = 1, ..., n, with regularly spaced nonrandom design, the authors study the behavior of the nonlinear wavelet estimator of g(x). When the threshold and truncation parameters are chosen by cross-validation on the everage squared error, strong consistency for the case of dyadic sample size and moment consistency for arbitrary sample size are established under some regular conditions. 展开更多
关键词 CONSISTENCY cross-validation nonparametric regression THRESHOLD TRUNCATION wavelet estimator
下载PDF
Using Multiple Risk Factors and Generalized Linear Mixed Models with 5-Fold Cross-Validation Strategy for Optimal Carotid Plaque Progression Prediction
5
作者 Qingyu Wang Dalin Tang +5 位作者 Liang Wang Gador Canton Zheyang Wu Thomas SHatsukami Kristen L Billiar Chun Yuan 《医用生物力学》 EI CAS CSCD 北大核心 2019年第A01期74-75,共2页
Background Cardiovascular diseases are closely linked to atherosclerotic plaque development and rupture.Plaque progression prediction is of fundamental significance to cardiovascular research and disease diagnosis,pre... Background Cardiovascular diseases are closely linked to atherosclerotic plaque development and rupture.Plaque progression prediction is of fundamental significance to cardiovascular research and disease diagnosis,prevention,and treatment.Generalized linear mixed models(GLMM)is an extension of linear model for categorical responses while considering the correlation among observations.Methods Magnetic resonance image(MRI)data of carotid atheroscleroticplaques were acquired from 20 patients with consent obtained and 3D thin-layer models were constructed to calculate plaque stress and strain for plaque progression prediction.Data for ten morphological and biomechanical risk factors included wall thickness(WT),lipid percent(LP),minimum cap thickness(MinCT),plaque area(PA),plaque burden(PB),lumen area(LA),maximum plaque wall stress(MPWS),maximum plaque wall strain(MPWSn),average plaque wall stress(APWS),and average plaque wall strain(APWSn)were extracted from all slices for analysis.Wall thickness increase(WTI),plaque burden increase(PBI)and plaque area increase(PAI) were chosen as three measures for plaque progression.Generalized linear mixed models(GLMM)with 5-fold cross-validation strategy were used to calculate prediction accuracy for each predictor and identify optimal predictor with the highest prediction accuracy defined as sum of sensitivity and specificity.All 201 MRI slices were randomly divided into 4 training subgroups and 1 verification subgroup.The training subgroups were used for model fitting,and the verification subgroup was used to estimate the model.All combinations(total1023)of 10 risk factors were feed to GLMM and the prediction accuracy of each predictor were selected from the point on the ROC(receiver operating characteristic)curve with the highest sum of specificity and sensitivity.Results LA was the best single predictor for PBI with the highest prediction accuracy(1.360 1),and the area under of the ROC curve(AUC)is0.654 0,followed by APWSn(1.336 3)with AUC=0.6342.The optimal predictor among all possible combinations for PBI was the combination of LA,PA,LP,WT,MPWS and MPWSn with prediction accuracy=1.414 6(AUC=0.715 8).LA was once again the best single predictor for PAI with the highest prediction accuracy(1.184 6)with AUC=0.606 4,followed by MPWSn(1. 183 2)with AUC=0.6084.The combination of PA,PB,WT,MPWS,MPWSn and APWSn gave the best prediction accuracy(1.302 5)for PAI,and the AUC value is 0.6657.PA was the best single predictor for WTI with highest prediction accuracy(1.288 7)with AUC=0.641 5,followed by WT(1.254 0),with AUC=0.6097.The combination of PA,PB,WT,LP,MinCT,MPWS and MPWS was the best predictor for WTI with prediction accuracy as 1.314 0,with AUC=0.6552.This indicated that PBI was a more predictable measure than WTI and PAI. The combinational predictors improved prediction accuracy by 9.95%,4.01%and 1.96%over the best single predictors for PAI,PBI and WTI(AUC values improved by9.78%,9.45%,and 2.14%),respectively.Conclusions The use of GLMM with 5-fold cross-validation strategy combining both morphological and biomechanical risk factors could potentially improve the accuracy of carotid plaque progression prediction.This study suggests that a linear combination of multiple predictors can provide potential improvement to existing plaque assessment schemes. 展开更多
关键词 Multiple Risk FACTORS GENERALIZED Linear 5-Fold cross-validation STRATEGY AUC
原文传递
Risk assessment of rockburst using SMOTE oversampling and integration algorithms under GBDT framework
6
作者 WANG Jia-chuang DONG Long-jun 《Journal of Central South University》 SCIE EI CAS CSCD 2024年第8期2891-2915,共25页
Rockburst is a common geological disaster in underground engineering,which seriously threatens the safety of personnel,equipment and property.Utilizing machine learning models to evaluate risk of rockburst is graduall... Rockburst is a common geological disaster in underground engineering,which seriously threatens the safety of personnel,equipment and property.Utilizing machine learning models to evaluate risk of rockburst is gradually becoming a trend.In this study,the integrated algorithms under Gradient Boosting Decision Tree(GBDT)framework were used to evaluate and classify rockburst intensity.First,a total of 301 rock burst data samples were obtained from a case database,and the data were preprocessed using synthetic minority over-sampling technique(SMOTE).Then,the rockburst evaluation models including GBDT,eXtreme Gradient Boosting(XGBoost),Light Gradient Boosting Machine(LightGBM),and Categorical Features Gradient Boosting(CatBoost)were established,and the optimal hyperparameters of the models were obtained through random search grid and five-fold cross-validation.Afterwards,use the optimal hyperparameter configuration to fit the evaluation models,and analyze these models using test set.In order to evaluate the performance,metrics including accuracy,precision,recall,and F1-score were selected to analyze and compare with other machine learning models.Finally,the trained models were used to conduct rock burst risk assessment on rock samples from a mine in Shanxi Province,China,and providing theoretical guidance for the mine's safe production work.The models under the GBDT framework perform well in the evaluation of rockburst levels,and the proposed methods can provide a reliable reference for rockburst risk level analysis and safety management. 展开更多
关键词 rockburst evaluation SMOTE oversampling random search grid k-fold cross-validation confusion matrix
下载PDF
Prediction of geological characteristics from shield operational parameters by integrating grid search and K-fold cross validation into stacking classification algorithm 被引量:6
7
作者 Tao Yan Shui-Long Shen +1 位作者 Annan Zhou Xiangsheng Chen 《Journal of Rock Mechanics and Geotechnical Engineering》 SCIE CSCD 2022年第4期1292-1303,共12页
This study presents a framework for predicting geological characteristics based on integrating a stacking classification algorithm(SCA) with a grid search(GS) and K-fold cross validation(K-CV). The SCA includes two le... This study presents a framework for predicting geological characteristics based on integrating a stacking classification algorithm(SCA) with a grid search(GS) and K-fold cross validation(K-CV). The SCA includes two learner layers: a primary learner’s layer and meta-classifier layer. The accuracy of the SCA can be improved by using the GS and K-CV. The GS was developed to match the hyper-parameters and optimise complicated problems. The K-CV is commonly applied to changing the validation set in a training set. In general, a GS is usually combined with K-CV to produce a corresponding evaluation index and select the best hyper-parameters. The torque penetration index(TPI) and field penetration index(FPI) are proposed based on shield parameters to express the geological characteristics. The elbow method(EM) and silhouette coefficient(Si) are employed to determine the types of geological characteristics(K) in a Kmeans++ algorithm. A case study on mixed ground in Guangzhou is adopted to validate the applicability of the developed model. The results show that with the developed framework, the four selected parameters, i.e. thrust, advance rate, cutterhead rotation speed and cutterhead torque, can be used to effectively predict the corresponding geological characteristics. 展开更多
关键词 Geological characteristics Stacking classification algorithm(SCA) k-fold cross-validation(K-CV) K-means++
下载PDF
基于k-fold交叉验证的代理模型序列采样方法 被引量:7
8
作者 李正良 彭思思 王涛 《计算力学学报》 CAS CSCD 北大核心 2022年第2期244-249,共6页
在代理模型序列采样框架下,针对现有研究中的不足之处,通过引入k-fold交叉验证计算样本的预测误差,并结合泰森多边形法和最大距离最小化准则,发展了一种适用于任意代理模型的k-fold CV-Voronoi自适应序列采样方法。相较于传统序列采样方... 在代理模型序列采样框架下,针对现有研究中的不足之处,通过引入k-fold交叉验证计算样本的预测误差,并结合泰森多边形法和最大距离最小化准则,发展了一种适用于任意代理模型的k-fold CV-Voronoi自适应序列采样方法。相较于传统序列采样方法,本文方法具有计算简单和自适应性强等显著优势。通过数值算例和工程算例对比分析发现所提序列采样方法具有较高的近似精度和计算效率,此外,进一步讨论了k-fold交叉验证中k的不同取值对于代理模型精度的影响,总结出k的最优取值范围以供参考。 展开更多
关键词 k-fold交叉验证 序列采样 代理模型 泰森多边形
下载PDF
基于DnCNN 的侵彻过载时频去噪方法
9
作者 郑宏亮 贾森清 +4 位作者 郭宇朋 薛颖杰 韩晶 赵河明 石志刚 《装备环境工程》 CAS 2024年第8期17-24,共8页
目的提高从侵彻过载中准确估计刚体过载信号的能力。方法提出一种基于前馈去噪卷积神经网络(DnCNN)的侵彻过载时频去噪方法,该方法首先应用短时傅里叶变换(STFT)提取侵彻过载信号的时频图像,使DnCNN能够充分利用时频图像信息,估计出刚... 目的提高从侵彻过载中准确估计刚体过载信号的能力。方法提出一种基于前馈去噪卷积神经网络(DnCNN)的侵彻过载时频去噪方法,该方法首先应用短时傅里叶变换(STFT)提取侵彻过载信号的时频图像,使DnCNN能够充分利用时频图像信息,估计出刚体过载时频图像。最后,通过逆STFT将时频图像转换回时域,得到估计的刚体过载信号。结果在5-Fold交叉验证中,所提方法在测试集上的平均绝对误差(MAE)为0.968%,Pearson相关系数(r)为90.35%。与低通滤波、总体经验模态分解(EEMD)和小波变换方法相比,所提方法的平均MAE分别降低了1.82%、1.00%、0.75%,平均相关系数r值分别提高了47.81%、17.48%、22.93%。结论所提方法可以从侵彻过载中准确估计出刚体过载信号,在去噪能力上优于低通滤波、EEMD和小波变换方法,且在去噪过程中,无需调整参数,能够自动完成去噪任务。 展开更多
关键词 硬目标侵彻 侵彻过载 前馈去噪卷积神经网络 信号去噪 时频分析 k-fold交叉验证
下载PDF
Multi-environment BSA-seq using large F3 populations is able to achieve reliable QTL mapping with high power and resolution: An experimental demonstration in rice
10
作者 Yan Zheng Ei Ei Khine +9 位作者 Khin Mar Thi Ei Ei Nyein Likun Huang Lihui Lin Xiaofang Xie Min Htay Wai Lin Khin Than Oo Myat Myat Moe San San Aye Weiren Wu 《The Crop Journal》 SCIE CSCD 2024年第2期549-557,共9页
Bulked-segregant analysis by deep sequencing(BSA-seq) is a widely used method for mapping QTL(quantitative trait loci) due to its simplicity, speed, cost-effectiveness, and efficiency. However, the ability of BSA-seq ... Bulked-segregant analysis by deep sequencing(BSA-seq) is a widely used method for mapping QTL(quantitative trait loci) due to its simplicity, speed, cost-effectiveness, and efficiency. However, the ability of BSA-seq to detect QTL is often limited by inappropriate experimental designs, as evidenced by numerous practical studies. Most BSA-seq studies have utilized small to medium-sized populations, with F2populations being the most common choice. Nevertheless, theoretical studies have shown that using a large population with an appropriate pool size can significantly enhance the power and resolution of QTL detection in BSA-seq, with F_(3)populations offering notable advantages over F2populations. To provide an experimental demonstration, we tested the power of BSA-seq to identify QTL controlling days from sowing to heading(DTH) in a 7200-plant rice F_(3)population in two environments, with a pool size of approximately 500. Each experiment identified 34 QTL, an order of magnitude greater than reported in most BSA-seq experiments, of which 23 were detected in both experiments, with 17 of these located near41 previously reported QTL and eight cloned genes known to control DTH in rice. These results indicate that QTL mapping by BSA-seq in large F_(3)populations and multi-environment experiments can achieve high power, resolution, and reliability. 展开更多
关键词 BSA-seq QTL mapping Large F3 population Multi-environment experiment cross-validation
下载PDF
Kriging Model Averaging Based on Leave-One-Out Cross-Validation Method
11
作者 FENG Ziheng ZONG Xianpeng +1 位作者 XIE Tianfa ZHANG Xinyu 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2024年第5期2132-2156,共25页
In recent years,Kriging model has gained wide popularity in various fields such as space geology,econometrics,and computer experiments.As a result,research on this model has proliferated.In this paper,the authors prop... In recent years,Kriging model has gained wide popularity in various fields such as space geology,econometrics,and computer experiments.As a result,research on this model has proliferated.In this paper,the authors propose a model averaging estimation based on the best linear unbiased prediction of Kriging model and the leave-one-out cross-validation method,with consideration for the model uncertainty.The authors present a weight selection criterion for the model averaging estimation and provide two theoretical justifications for the proposed method.First,the estimated weight based on the proposed criterion is asymptotically optimal in achieving the lowest possible prediction risk.Second,the proposed method asymptotically assigns all weights to the correctly specified models when the candidate model set includes these models.The effectiveness of the proposed method is verified through numerical analyses. 展开更多
关键词 Asymptotic optimality best linear unbiased prediction cross-validation Kriging model model averaging
原文传递
Height-diameter models for King Boris fir(Abies borisii regis Mattf.) and Scots pine(Pinus sylvestris L.) in Olympus and Pieria Mountains, Greece
12
作者 Dimitrios I.RAPTIS Dimitra PAPADOPOULOU +3 位作者 Angeliki PSARRA Athanasios A.FALLIAS Aristides G.TSITSANIS Vassiliki KAZANA 《Journal of Mountain Science》 SCIE CSCD 2024年第5期1475-1490,共16页
In forest science and practice, the total tree height is one of the basic morphometric attributes at the tree level and it has been closely linked with important stand attributes. In the current research, sixteen nonl... In forest science and practice, the total tree height is one of the basic morphometric attributes at the tree level and it has been closely linked with important stand attributes. In the current research, sixteen nonlinear functions for height prediction were tested in terms of their fitting ability against samples of Abies borisii regis and Pinus sylvestris trees from mountainous forests in central Greece. The fitting procedure was based on generalized nonlinear weighted regression. At the final stage, a five-quantile nonlinear height-diameter model was developed for both species through a quantile regression approach, to estimate the entire conditional distribution of tree height, enabling the evaluation of the diameter impact at various quantiles and providing a comprehensive understanding of the proposed relationship across the distribution. The results clearly showed that employing the diameter as the sole independent variable, the 3-parameter Hossfeld function and the 2-parameter N?slund function managed to explain approximately 84.0% and 81.7% of the total height variance in the case of King Boris fir and Scots pine species, respectively. Furthermore, the models exhibited low levels of error in both cases(2.310m for the fir and 3.004m for the pine), yielding unbiased predictions for both fir(-0.002m) and pine(-0.004m). Notably, all the required assumptions for homogeneity and normality of the associated residuals were achieved through the weighting procedure, while the quantile regression approach provided additional insights into the height-diameter allometry of the specific species. The proposed models can turn into valuable tools for operational forest management planning, particularly for wood production and conservation of mountainous forest ecosystems. 展开更多
关键词 Generalized nonlinear weighted regression Monte Carlo cross-validation Mountainous ecosystems Quantile regression Central Greece
原文传递
Developing a Model for Parkinson’s Disease Detection Using Machine Learning Algorithms
13
作者 Naif Al Mudawi 《Computers, Materials & Continua》 SCIE EI 2024年第6期4945-4962,共18页
Parkinson’s disease(PD)is a chronic neurological condition that progresses over time.People start to have trouble speaking,writing,walking,or performing other basic skills as dopamine-generating neurons in some brain... Parkinson’s disease(PD)is a chronic neurological condition that progresses over time.People start to have trouble speaking,writing,walking,or performing other basic skills as dopamine-generating neurons in some brain regions are injured or die.The patient’s symptoms become more severe due to the worsening of their signs over time.In this study,we applied state-of-the-art machine learning algorithms to diagnose Parkinson’s disease and identify related risk factors.The research worked on the publicly available dataset on PD,and the dataset consists of a set of significant characteristics of PD.We aim to apply soft computing techniques and provide an effective solution for medical professionals to diagnose PD accurately.This research methodology involves developing a model using a machine learning algorithm.In the model selection,eight different machine learning techniques were adopted:Namely,Random Forest(RF),Decision Tree(DT),Support Vector Machine(SVM),Naïve Bayes(NB),Light Gradient Boosting Machine(LightGBM),K-Nearest Neighbours(KNN),Extreme Gradient Boosting(XGBoost),and Logistic Regression(LR).Subsequently,the concentrated models were validated through 10-fold Cross-Validation and Receiver Operating Characteristic(ROC)—Area Under the Curve(AUC).In addition,GridSearchCV was utilised to measure each algorithm’s best parameter;eventually,the models were trained through the hyperparameter tuning approach.With 98%accuracy,LightGBM had the highest accuracy in this study.RF,KNN,and SVM came in second with 96%accuracy.Furthermore,the performance scores of NB and LR were recorded to be 76%and 83%,respectively.It is to be mentioned that after applying 10-fold cross-validation,the average performance score of LightGBM accounted for 93%.At the same time,the percentage of ROC-AUC appeared at 0.92,which indicates that this LightGBM model reached a satisfactory level.Finally,we extracted meaningful insights and figured out potential gaps on top of PD.By extracting meaningful insights and identifying potential gaps,our study contributes to the significance and impact of PD research.The application of advanced machine learning algorithms holds promise in accurately diagnosing PD and shedding light on crucial aspects of the disease.This research has the potential to enhance the understanding and management of PD,ultimately improving the lives of individuals affected by this condition. 展开更多
关键词 Light GBM cross-validation ROC-AUC Parkinson’s disease(PD) SVM and XGBoost
下载PDF
Adaptive Random Effects/Coefficients Modeling
14
作者 George J. Knafl 《Open Journal of Statistics》 2024年第2期179-206,共28页
Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using general... Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time. 展开更多
关键词 Adaptive Regression Correlated Outcomes Extended Linear Mixed Modeling Fractional Polynomials Likelihood cross-validation Random Effects/Coefficients
下载PDF
Bayesian Classifier Based on Robust Kernel Density Estimation and Harris Hawks Optimisation
15
作者 Bi Iritie A-D Boli Chenghao Wei 《International Journal of Internet and Distributed Systems》 2024年第1期1-23,共23页
In real-world applications, datasets frequently contain outliers, which can hinder the generalization ability of machine learning models. Bayesian classifiers, a popular supervised learning method, rely on accurate pr... In real-world applications, datasets frequently contain outliers, which can hinder the generalization ability of machine learning models. Bayesian classifiers, a popular supervised learning method, rely on accurate probability density estimation for classifying continuous datasets. However, achieving precise density estimation with datasets containing outliers poses a significant challenge. This paper introduces a Bayesian classifier that utilizes optimized robust kernel density estimation to address this issue. Our proposed method enhances the accuracy of probability density distribution estimation by mitigating the impact of outliers on the training sample’s estimated distribution. Unlike the conventional kernel density estimator, our robust estimator can be seen as a weighted kernel mapping summary for each sample. This kernel mapping performs the inner product in the Hilbert space, allowing the kernel density estimation to be considered the average of the samples’ mapping in the Hilbert space using a reproducing kernel. M-estimation techniques are used to obtain accurate mean values and solve the weights. Meanwhile, complete cross-validation is used as the objective function to search for the optimal bandwidth, which impacts the estimator. The Harris Hawks Optimisation optimizes the objective function to improve the estimation accuracy. The experimental results show that it outperforms other optimization algorithms regarding convergence speed and objective function value during the bandwidth search. The optimal robust kernel density estimator achieves better fitness performance than the traditional kernel density estimator when the training data contains outliers. The Naïve Bayesian with optimal robust kernel density estimation improves the generalization in the classification with outliers. 展开更多
关键词 CLASSIFICATION Robust Kernel Density Estimation M-ESTIMATION Harris Hawks Optimisation Algorithm Complete cross-validation
下载PDF
The Distortion Theorems for k-fold Symmetric Quasi-convex Mappings along a Unit Direction in C^n 被引量:1
16
作者 卢金 刘太顺 王建飞 《Chinese Quarterly Journal of Mathematics》 CSCD 2012年第4期475-479,共5页
We obtain a distortion theorem of Jacobian matrix Jf(z) for k-fold symmetric quasi-convex f along a unit direction in C^n on the unit polydisc.
关键词 quasi-convex mappings k-fold symmetric distortion theorem
下载PDF
K-fold输入方式下的渝东北地区SPI指数干旱预测模型 被引量:1
17
作者 牛文娟 《水利技术监督》 2021年第12期155-160,共6页
文章基于丰都和万州2个站点,以支持向量机模型(SVM)为基础,采用粒子群算法(PSO)和遗传算法(GA)优化SVM模型,选用K-fold的参数输入方式,对渝东北地区SPI干旱指数进行了预测,得出了区域干旱预测的推荐模型。结果表明:不同模型对SPI指数的... 文章基于丰都和万州2个站点,以支持向量机模型(SVM)为基础,采用粒子群算法(PSO)和遗传算法(GA)优化SVM模型,选用K-fold的参数输入方式,对渝东北地区SPI干旱指数进行了预测,得出了区域干旱预测的推荐模型。结果表明:不同模型对SPI指数的预测精度存在差异,其中PSO-SVM模型精度普遍优于其余模型,且考虑温度和日照时数的模型精度最优,在2个站点的GPI均排名第1,且泰勒图中与标准值最为接近。PSO-SVM模型可作为渝东北地区干旱预测的标准模型使用。 展开更多
关键词 渝东北 干旱预测 SPI指数 k-fold 粒子群算法 支持向量机
下载PDF
OPT-BAG Model for Predicting Student Employability
18
作者 Minh-Thanh Vo Trang Nguyen Tuong Le 《Computers, Materials & Continua》 SCIE EI 2023年第8期1555-1568,共14页
The use of machine learning to predict student employability is important in order to analyse a student’s capability to get a job.Based on the results of this type of analysis,university managers can improve the empl... The use of machine learning to predict student employability is important in order to analyse a student’s capability to get a job.Based on the results of this type of analysis,university managers can improve the employability of their students,which can help in attracting students in the future.In addition,learners can focus on the essential skills identified through this analysis during their studies,to increase their employability.An effectivemethod calledOPT-BAG(OPTimisation of BAGging classifiers)was therefore developed to model the problem of predicting the employability of students.This model can help predict the employability of students based on their competencies and can reveal weaknesses that need to be improved.First,we analyse the relationships between several variables and the outcome variable using a correlation heatmap for a student employability dataset.Next,a standard scaler function is applied in the preprocessing module to normalise the variables in the student employability dataset.The training set is then input to our model to identify the optimal parameters for the bagging classifier using a grid search cross-validation technique.Finally,the OPT-BAG model,based on a bagging classifier with optimal parameters found in the previous step,is trained on the training dataset to predict student employability.The empirical outcomes in terms of accuracy,precision,recall,and F1 indicate that the OPT-BAG approach outperforms other cutting-edge machine learning models in terms of predicting student employability.In this study,we also analyse the factors affecting the recruitment process of employers,and find that general appearance,mental alertness,and communication skills are the most important.This indicates that educational institutions should focus on these factors during the learning process to improve student employability. 展开更多
关键词 Ensemble classifier grid search cross-validation OPT-BAG student employability
下载PDF
Performance Evaluation of Deep Dense Layer Neural Network for Diabetes Prediction
19
作者 Niharika Gupta Baijnath Kaushik +1 位作者 Mohammad Khalid Imam Rahmani Saima Anwar Lashari 《Computers, Materials & Continua》 SCIE EI 2023年第7期347-366,共20页
Diabetes is one of the fastest-growing human diseases worldwide and poses a significant threat to the population’s longer lives.Early prediction of diabetes is crucial to taking precautionary steps to avoid or delay ... Diabetes is one of the fastest-growing human diseases worldwide and poses a significant threat to the population’s longer lives.Early prediction of diabetes is crucial to taking precautionary steps to avoid or delay its onset.In this study,we proposed a Deep Dense Layer Neural Network(DDLNN)for diabetes prediction using a dataset with 768 instances and nine variables.We also applied a combination of classical machine learning(ML)algorithms and ensemble learning algorithms for the effective prediction of the disease.The classical ML algorithms used were Support Vector Machine(SVM),Logistic Regression(LR),Decision Tree(DT),K-Nearest Neighbor(KNN),and Naïve Bayes(NB).We also constructed ensemble models such as bagging(Random Forest)and boosting like AdaBoost and Extreme Gradient Boosting(XGBoost)to evaluate the performance of prediction models.The proposed DDLNN model and ensemble learning models were trained and tested using hyperparameter tuning and K-Fold cross-validation to determine the best parameters for predicting the disease.The combined ML models used majority voting to select the best outcomes among the models.The efficacy of the proposed and other models was evaluated for effective diabetes prediction.The investigation concluded that the proposed model,after hyperparameter tuning,outperformed other learning models with an accuracy of 84.42%,a precision of 85.12%,a recall rate of 65.40%,and a specificity of 94.11%. 展开更多
关键词 Diabetes prediction hyperparameter tuning k-fold validation machine learning neural network
下载PDF
SCADA Data-Based Support Vector Machine for False Alarm Identification for Wind Turbine Management
20
作者 Ana María Peco Chacón Isaac Segovia Ramírez Fausto Pedro García Márquez 《Intelligent Automation & Soft Computing》 SCIE 2023年第9期2595-2608,共14页
Maintenance operations have a critical influence on power gen-eration by wind turbines(WT).Advanced algorithms must analyze large volume of data from condition monitoring systems(CMS)to determine the actual working co... Maintenance operations have a critical influence on power gen-eration by wind turbines(WT).Advanced algorithms must analyze large volume of data from condition monitoring systems(CMS)to determine the actual working conditions and avoid false alarms.This paper proposes different support vector machine(SVM)algorithms for the prediction and detection of false alarms.K-Fold cross-validation(CV)is applied to evaluate the classification reliability of these algorithms.Supervisory Control and Data Acquisition(SCADA)data from an operating WT are applied to test the proposed approach.The results from the quadratic SVM showed an accuracy rate of 98.6%.Misclassifications from the confusion matrix,alarm log and maintenance records are analyzed to obtain quantitative information and determine if it is a false alarm.The classifier reduces the number of false alarms called misclassifications by 25%.These results demonstrate that the proposed approach presents high reliability and accuracy in false alarm identification. 展开更多
关键词 Machine learning classification support vector machine false alarm wind turbine cross-validation
下载PDF
上一页 1 2 7 下一页 到第
使用帮助 返回顶部