In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived ...In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived from two quite different perspectives. Here, settling on the most commonly accepted definition of the MSPE as the expectation of the squared prediction error loss, we provide theoretical expressions for it, valid for any linear model (LM) fitter, be it under random or non random designs. Specializing these MSPE expressions for each of them, we are able to derive closed formulas of the MSPE for some of the most popular LM fitters: Ordinary Least Squares (OLS), with or without a full column rank design matrix;Ordinary and Generalized Ridge regression, the latter embedding smoothing splines fitting. For each of these LM fitters, we then deduce a computable estimate of the MSPE which turns out to coincide with Akaike’s FPE. Using a slight variation, we similarly get a class of MSPE estimates coinciding with the classical GCV formula for those same LM fitters.展开更多
Frequentist model averaging has received much attention from econometricians and statisticians in recent years.A key problem with frequentist model average estimators is the choice of weights.This paper develops a new...Frequentist model averaging has received much attention from econometricians and statisticians in recent years.A key problem with frequentist model average estimators is the choice of weights.This paper develops a new approach of choosing weights based on an approximation of generalized cross validation.The resultant least squares model average estimators are proved to be asymptotically optimal in the sense of achieving the lowest possible squared errors.Especially,the optimality is built under both discrete and continuous weigh sets.Compared with the existing approach based on Mallows criterion,the conditions required for the asymptotic optimality of the proposed method are more reasonable.Simulation studies and real data application show good performance of the proposed estimators.展开更多
为了实现提高产量和抵抗病害等能力的目的,需要提高育种水平,通过设计交差验证(Cross-Validation)实验进行大豆基因型和表型数据的分组处理,根据数据的个体和mark的数量进行合理分配,采用gBLUP(genomic Best Linear Unbiased Prediction...为了实现提高产量和抵抗病害等能力的目的,需要提高育种水平,通过设计交差验证(Cross-Validation)实验进行大豆基因型和表型数据的分组处理,根据数据的个体和mark的数量进行合理分配,采用gBLUP(genomic Best Linear Unbiased Prediction)方法进行表型预测。根据对大豆数据多个性状通过不同分组的对比来得到精确值的范围,为后续的育种分析提供依据。对于只有大豆基因型数据而没有表型数据的情况,需要模拟表型,根据设定遗传力和模拟位点的个数(NQTN)进行模拟,然后再进行不同分组获取精准值,这样扩大了大豆数据的预测灵活性。展开更多
A method for fast l-fold cross validation is proposed for the regularized extreme learning machine(RELM). The computational time of fast l-fold cross validation increases as the fold number decreases, which is opposit...A method for fast l-fold cross validation is proposed for the regularized extreme learning machine(RELM). The computational time of fast l-fold cross validation increases as the fold number decreases, which is opposite to that of naive l-fold cross validation. As opposed to naive l-fold cross validation, fast l-fold cross validation takes the advantage in terms of computational time, especially for the large fold number such as l > 20. To corroborate the efficacy and feasibility of fast l-fold cross validation,experiments on five benchmark regression data sets are evaluated.展开更多
Background: A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Prediction...Background: A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Prediction, using whole-genome data. Leave-one-out cross validation can be used to quantify the predictive ability of a statistical model.Methods: Naive application of Leave-one-out cross validation is computationally intensive because the training and validation analyses need to be repeated n times, once for each observation. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.Results: Efficient Leave-one-out cross validation strategies is 786 times faster than the naive application for a simulated dataset with 1,000 observations and 10,000 markers and 99 times faster with 1,000 observations and 100 markers. These efficiencies relative to the naive approach using the same model will increase with increases in the number of observations.Conclusions: Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.展开更多
In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues....In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues. In order to assess and compare several strategies, we will conduct a simulation study with 15 predictors and a complex correlation structure in the linear regression model. Using sample sizes of 100 and 400 and estimates of the residual variance corresponding to R2 of 0.50 and 0.71, we consider 4 scenarios with varying amount of information. We also consider two examples with 24 and 13 predictors, respectively. We will discuss the value of cross-validation, shrinkage and backward elimination (BE) with varying significance level. We will assess whether 2-step approaches using global or parameterwise shrinkage (PWSF) can improve selected models and will compare results to models derived with the LASSO procedure. Beside of MSE we will use model sparsity and further criteria for model assessment. The amount of information in the data has an influence on the selected models and the comparison of the procedures. None of the approaches was best in all scenarios. The performance of backward elimination with a suitably chosen significance level was not worse compared to the LASSO and BE models selected were much sparser, an important advantage for interpretation and transportability. Compared to global shrinkage, PWSF had better performance. Provided that the amount of information is not too small, we conclude that BE followed by PWSF is a suitable approach when variable selection is a key part of data analysis.展开更多
Cross-spring pivots, formed by crossing two identical flexural beams at their midpoint, have been broadly used in precision engineering and aerospace fields. Many researches have been conducted on modeling and analysi...Cross-spring pivots, formed by crossing two identical flexural beams at their midpoint, have been broadly used in precision engineering and aerospace fields. Many researches have been conducted on modeling and analysis of cross-spring pivots. However the influence of application position and magnitude of the external loads on the load-rotation and parasitic motion characteristics has not yet been discussed. In order to reveal the effect of the external loads, this paper develops the accurate load-rotation and center shift models of cross-spring pivots, with generalized planar loads applied including bending moment, horizontal and vertical forces. Firstly, by using the energy method, the load-displacement models of the pivot are derived with the assumption of small rotational angles. Based on the models, the influence of generalized planar loads on the load-rotation relationship is discussed, which shows that both application position and magnitude of the vertical and horizontal forces influence the load-rotation behaviors. Then the accurate center shift expressions of the pivot with generalized planar loads are developed, which shows that the rotational angle is the dominant term for both components of the center shift while the vertical and horizontal forces are small. Finally, the accuracy of the proposed model is validated by finite element analysis(FEA). Comparing the model data with the results obtained from FEA, the relative error of the load-rotation is less than 6% even if the rotational angle reaches 20°; the relative errors of the two components of center shift are less than 5% and 10% respectively when the rotational angle reaches 10°. The proposed model and analytical conclusions can be used to analyze and preliminarily design the compliant mechanisms containing cross-spring pivots.展开更多
Support vector machine (SVM) has been successfully applied for classification in this paper. This paper discussed the basic principle of the SVM at first, and then SVM classifier with polynomial kernel and the Gaussia...Support vector machine (SVM) has been successfully applied for classification in this paper. This paper discussed the basic principle of the SVM at first, and then SVM classifier with polynomial kernel and the Gaussian radial basis function kernel are choosen to determine pupils who have difficulties in writing. The 10-fold cross-validation method for training and validating is introduced. The aim of this paper is to compare the performance of support vector machine with RBF and polynomial kernel used for classifying pupils with or without handwriting difficulties. Experimental results showed that the performance of SVM with RBF kernel is better than the one with polynomial kernel.展开更多
Cross iteration often exists in the computational process of the simulation models, especially for control models. There is a credibility defect tracing problem in the validation of models with cross iteration. In ord...Cross iteration often exists in the computational process of the simulation models, especially for control models. There is a credibility defect tracing problem in the validation of models with cross iteration. In order to resolve this problem, after the problem formulation, a validation theorem on the cross iteration is proposed, and the proof of the theorem is given under the cross iteration circumstance. Meanwhile, applying the proposed theorem, the credibility calculation algorithm is provided, and the solvent of the defect tracing is explained. Further, based on the validation theorem on the cross iteration, a validation method for simulation models with the cross iteration is proposed, which is illustrated by a flowchart step by step. Finally, a validation example of a sixdegree of freedom (DOF) flight vehicle model is provided, and the validation process is performed by using the validation method. The result analysis shows that the method is effective to obtain the credibility of the model and accomplish the defect tracing of the validation.展开更多
For the nonparametric regression model Yni =g(Xni) +εnii = 1, … n. with regulary spaced nonrandom design, the authors study the behavior of the nonlinear wavelet estimator of g(x). When the threshold and truncation ...For the nonparametric regression model Yni =g(Xni) +εnii = 1, … n. with regulary spaced nonrandom design, the authors study the behavior of the nonlinear wavelet estimator of g(x). When the threshold and truncation parameters are chosen by crossvalidation on the everage squared error, strong consistency for the case of dyadic sample size and moment consistency for arbitrary sample size are established under some regular conditions.展开更多
Tropical cyclones (TCs) and storms (TSs) are among the devastating events in the world and southwestern Indian Ocean (SWIO) in particular. The seasonal forecasting TCs and TSs for December to March (DJFM) and November...Tropical cyclones (TCs) and storms (TSs) are among the devastating events in the world and southwestern Indian Ocean (SWIO) in particular. The seasonal forecasting TCs and TSs for December to March (DJFM) and November to May (NM) over SWIO were conducted. Dynamic parameters including vertical wind shear, mean zonal steering wind and vorticity at 850 mb were derived from NOAA (NCEP-NCAR) reanalysis 1 wind fields. Thermodynamic parameters including monthly and daily mean Sea Surface Temperature (SST), Outgoing Longwave Radiation (OLR) and equatorial Standard Oscillation Index (SOI) were used. Three types of Poison regression models (i.e. dynamic, thermodynamic and combined models) were developed and validated using the Leave One Out Cross Validation (LOOCV). Moreover, 2 × 2 square matrix contingency tables for model verification were used. The results revealed that, the observed and cross validated DJFM and NM TCs and TSs strongly correlated with each other (p ≤ 0.02) for all model types, with correlations (r) ranging from 0.62 - 0.86 for TCs and 0.52 - 0.87 for TSs, indicating great association between these variables. Assessment of the model skill for all model types of DJFM and NM TCs and TSs frequency revealed high skill scores ranging from 38% - 70% for TCs and 26% - 72% for TSs frequency, respectively. Moreover, results indicated that the dynamic and combined models had higher skill scores than the thermodynamic models. The DJFM and NM selected predictors explained the TCs and TSs variability by the range of 0.45 - 0.65 and 0.37 - 0.66, respectively. However, verification analysis revealed that all models were adequate for predicting the seasonal TCs and TSs, with high bias values ranging from 0.85 - 0.94. Conclusively, the study calls for more studies in TCs and TSs frequency and strengths for enhancing the performance of the March to May (MAM) and December to October (OND) seasonal rainfalls in the East African (EA) and Tanzania in particular.展开更多
文摘In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived from two quite different perspectives. Here, settling on the most commonly accepted definition of the MSPE as the expectation of the squared prediction error loss, we provide theoretical expressions for it, valid for any linear model (LM) fitter, be it under random or non random designs. Specializing these MSPE expressions for each of them, we are able to derive closed formulas of the MSPE for some of the most popular LM fitters: Ordinary Least Squares (OLS), with or without a full column rank design matrix;Ordinary and Generalized Ridge regression, the latter embedding smoothing splines fitting. For each of these LM fitters, we then deduce a computable estimate of the MSPE which turns out to coincide with Akaike’s FPE. Using a slight variation, we similarly get a class of MSPE estimates coinciding with the classical GCV formula for those same LM fitters.
基金by National Key R&D Program of China(2020AAA0105200)the Ministry of Science and Technology of China(Grant no.2016YFB0502301)+1 种基金the National Natural Science Foundation of China(Grant nos.11871294,12031016,11971323,71925007,72042019,72091212 and 12001559)a joint grant from the Academy for Multidisciplinary Studies,Capital Normal University.
文摘Frequentist model averaging has received much attention from econometricians and statisticians in recent years.A key problem with frequentist model average estimators is the choice of weights.This paper develops a new approach of choosing weights based on an approximation of generalized cross validation.The resultant least squares model average estimators are proved to be asymptotically optimal in the sense of achieving the lowest possible squared errors.Especially,the optimality is built under both discrete and continuous weigh sets.Compared with the existing approach based on Mallows criterion,the conditions required for the asymptotic optimality of the proposed method are more reasonable.Simulation studies and real data application show good performance of the proposed estimators.
文摘为了实现提高产量和抵抗病害等能力的目的,需要提高育种水平,通过设计交差验证(Cross-Validation)实验进行大豆基因型和表型数据的分组处理,根据数据的个体和mark的数量进行合理分配,采用gBLUP(genomic Best Linear Unbiased Prediction)方法进行表型预测。根据对大豆数据多个性状通过不同分组的对比来得到精确值的范围,为后续的育种分析提供依据。对于只有大豆基因型数据而没有表型数据的情况,需要模拟表型,根据设定遗传力和模拟位点的个数(NQTN)进行模拟,然后再进行不同分组获取精准值,这样扩大了大豆数据的预测灵活性。
基金supported by the National Natural Science Foundation of China(51006052)the NUST Outstanding Scholar Supporting Program
文摘A method for fast l-fold cross validation is proposed for the regularized extreme learning machine(RELM). The computational time of fast l-fold cross validation increases as the fold number decreases, which is opposite to that of naive l-fold cross validation. As opposed to naive l-fold cross validation, fast l-fold cross validation takes the advantage in terms of computational time, especially for the large fold number such as l > 20. To corroborate the efficacy and feasibility of fast l-fold cross validation,experiments on five benchmark regression data sets are evaluated.
基金supported by the US Department of Agriculture,Agriculture and Food Research Initiative National Institute of Food and Agriculture Competitive grant no.2015-67015-22947
文摘Background: A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Prediction, using whole-genome data. Leave-one-out cross validation can be used to quantify the predictive ability of a statistical model.Methods: Naive application of Leave-one-out cross validation is computationally intensive because the training and validation analyses need to be repeated n times, once for each observation. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.Results: Efficient Leave-one-out cross validation strategies is 786 times faster than the naive application for a simulated dataset with 1,000 observations and 10,000 markers and 99 times faster with 1,000 observations and 100 markers. These efficiencies relative to the naive approach using the same model will increase with increases in the number of observations.Conclusions: Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.
文摘In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues. In order to assess and compare several strategies, we will conduct a simulation study with 15 predictors and a complex correlation structure in the linear regression model. Using sample sizes of 100 and 400 and estimates of the residual variance corresponding to R2 of 0.50 and 0.71, we consider 4 scenarios with varying amount of information. We also consider two examples with 24 and 13 predictors, respectively. We will discuss the value of cross-validation, shrinkage and backward elimination (BE) with varying significance level. We will assess whether 2-step approaches using global or parameterwise shrinkage (PWSF) can improve selected models and will compare results to models derived with the LASSO procedure. Beside of MSE we will use model sparsity and further criteria for model assessment. The amount of information in the data has an influence on the selected models and the comparison of the procedures. None of the approaches was best in all scenarios. The performance of backward elimination with a suitably chosen significance level was not worse compared to the LASSO and BE models selected were much sparser, an important advantage for interpretation and transportability. Compared to global shrinkage, PWSF had better performance. Provided that the amount of information is not too small, we conclude that BE followed by PWSF is a suitable approach when variable selection is a key part of data analysis.
基金supported by National Natural Science Foundation of China(Grant Nos. 50975007, 51105014)PhD Programs Foundation of Ministry of Education of China(Grant No. 20091102110023)China Postdoctoral Science Foundation(Grant No. 20100480179)
文摘Cross-spring pivots, formed by crossing two identical flexural beams at their midpoint, have been broadly used in precision engineering and aerospace fields. Many researches have been conducted on modeling and analysis of cross-spring pivots. However the influence of application position and magnitude of the external loads on the load-rotation and parasitic motion characteristics has not yet been discussed. In order to reveal the effect of the external loads, this paper develops the accurate load-rotation and center shift models of cross-spring pivots, with generalized planar loads applied including bending moment, horizontal and vertical forces. Firstly, by using the energy method, the load-displacement models of the pivot are derived with the assumption of small rotational angles. Based on the models, the influence of generalized planar loads on the load-rotation relationship is discussed, which shows that both application position and magnitude of the vertical and horizontal forces influence the load-rotation behaviors. Then the accurate center shift expressions of the pivot with generalized planar loads are developed, which shows that the rotational angle is the dominant term for both components of the center shift while the vertical and horizontal forces are small. Finally, the accuracy of the proposed model is validated by finite element analysis(FEA). Comparing the model data with the results obtained from FEA, the relative error of the load-rotation is less than 6% even if the rotational angle reaches 20°; the relative errors of the two components of center shift are less than 5% and 10% respectively when the rotational angle reaches 10°. The proposed model and analytical conclusions can be used to analyze and preliminarily design the compliant mechanisms containing cross-spring pivots.
文摘Support vector machine (SVM) has been successfully applied for classification in this paper. This paper discussed the basic principle of the SVM at first, and then SVM classifier with polynomial kernel and the Gaussian radial basis function kernel are choosen to determine pupils who have difficulties in writing. The 10-fold cross-validation method for training and validating is introduced. The aim of this paper is to compare the performance of support vector machine with RBF and polynomial kernel used for classifying pupils with or without handwriting difficulties. Experimental results showed that the performance of SVM with RBF kernel is better than the one with polynomial kernel.
基金supported by the National Natural Science Foundation of China(61374164)
文摘Cross iteration often exists in the computational process of the simulation models, especially for control models. There is a credibility defect tracing problem in the validation of models with cross iteration. In order to resolve this problem, after the problem formulation, a validation theorem on the cross iteration is proposed, and the proof of the theorem is given under the cross iteration circumstance. Meanwhile, applying the proposed theorem, the credibility calculation algorithm is provided, and the solvent of the defect tracing is explained. Further, based on the validation theorem on the cross iteration, a validation method for simulation models with the cross iteration is proposed, which is illustrated by a flowchart step by step. Finally, a validation example of a sixdegree of freedom (DOF) flight vehicle model is provided, and the validation process is performed by using the validation method. The result analysis shows that the method is effective to obtain the credibility of the model and accomplish the defect tracing of the validation.
文摘For the nonparametric regression model Yni =g(Xni) +εnii = 1, … n. with regulary spaced nonrandom design, the authors study the behavior of the nonlinear wavelet estimator of g(x). When the threshold and truncation parameters are chosen by crossvalidation on the everage squared error, strong consistency for the case of dyadic sample size and moment consistency for arbitrary sample size are established under some regular conditions.
文摘Tropical cyclones (TCs) and storms (TSs) are among the devastating events in the world and southwestern Indian Ocean (SWIO) in particular. The seasonal forecasting TCs and TSs for December to March (DJFM) and November to May (NM) over SWIO were conducted. Dynamic parameters including vertical wind shear, mean zonal steering wind and vorticity at 850 mb were derived from NOAA (NCEP-NCAR) reanalysis 1 wind fields. Thermodynamic parameters including monthly and daily mean Sea Surface Temperature (SST), Outgoing Longwave Radiation (OLR) and equatorial Standard Oscillation Index (SOI) were used. Three types of Poison regression models (i.e. dynamic, thermodynamic and combined models) were developed and validated using the Leave One Out Cross Validation (LOOCV). Moreover, 2 × 2 square matrix contingency tables for model verification were used. The results revealed that, the observed and cross validated DJFM and NM TCs and TSs strongly correlated with each other (p ≤ 0.02) for all model types, with correlations (r) ranging from 0.62 - 0.86 for TCs and 0.52 - 0.87 for TSs, indicating great association between these variables. Assessment of the model skill for all model types of DJFM and NM TCs and TSs frequency revealed high skill scores ranging from 38% - 70% for TCs and 26% - 72% for TSs frequency, respectively. Moreover, results indicated that the dynamic and combined models had higher skill scores than the thermodynamic models. The DJFM and NM selected predictors explained the TCs and TSs variability by the range of 0.45 - 0.65 and 0.37 - 0.66, respectively. However, verification analysis revealed that all models were adequate for predicting the seasonal TCs and TSs, with high bias values ranging from 0.85 - 0.94. Conclusively, the study calls for more studies in TCs and TSs frequency and strengths for enhancing the performance of the March to May (MAM) and December to October (OND) seasonal rainfalls in the East African (EA) and Tanzania in particular.