Partial least squares(PLS) regression is an important linear regression method that efficiently addresses the multiple correlation problem by combining principal component analysis and multiple regression. In this pap...Partial least squares(PLS) regression is an important linear regression method that efficiently addresses the multiple correlation problem by combining principal component analysis and multiple regression. In this paper, we present a quantum partial least squares(QPLS) regression algorithm. To solve the high time complexity of the PLS regression, we design a quantum eigenvector search method to speed up principal components and regression parameters construction. Meanwhile, we give a density matrix product method to avoid multiple access to quantum random access memory(QRAM)during building residual matrices. The time and space complexities of the QPLS regression are logarithmic in the independent variable dimension n, the dependent variable dimension w, and the number of variables m. This algorithm achieves exponential speed-ups over the PLS regression on n, m, and w. In addition, the QPLS regression inspires us to explore more potential quantum machine learning applications in future works.展开更多
Estimating wheat grain protein content by remote sensing is important for assessing wheat quality at maturity and making grains harvest and purchase policies. However, spatial variability of soil condition, temperatur...Estimating wheat grain protein content by remote sensing is important for assessing wheat quality at maturity and making grains harvest and purchase policies. However, spatial variability of soil condition, temperature, and precipitation will affect grain protein contents and these factors usually cannot be monitored accurately by remote sensing data from single image. In this research, the relationships between wheat protein content at maturity and wheat agronomic parameters at different growing stages were analyzed and multi-temporal images of Landsat TM were used to estimate grain protein content by partial least squares regression. Experiment data were acquired in the suburb of Beijing during a 2-yr experiment in the period from 2003 to 2004. Determination coefficient, average deviation of self-modeling, and deviation of cross- validation were employed to assess the estimation accuracy of wheat grain protein content. Their values were 0.88, 1.30%, 3.81% and 0.72, 5.22%, 12.36% for 2003 and 2004, respectively. The research laid an agronomic foundation for GPC (grain protein content) estimation by multi-temporal remote sensing. The results showed that it is feasible to estimate GPC of wheat from multi-temporal remote sensing data in large area.展开更多
Several indices and simple empirical models and ratios of single band from pre-and post-fire Landsat images have been developed to estimate and/or map burn severity.However,these models and indices are usually site-,t...Several indices and simple empirical models and ratios of single band from pre-and post-fire Landsat images have been developed to estimate and/or map burn severity.However,these models and indices are usually site-,time-and vegetation-dependent and their applications are limited.The Daxing'an Mountains range has the largest forested area in China and is prone to wildfires.Whether or not the existing models can effectively characterize the burn severity over a large region is unclear.In this study,we used the orthogonal signal correction method based on partial least squares regression(PLSR)to select those variables that better interpret the variance of burn severity.A new index and other commonly used indices were used to construct a new,multivariate PLSR model which was compared with the popular single variable models,according to three assessment indices:relative root mean square error(RMSE%),relative bias(R E%)and Nash–Sutcliffe efficiency(NSE%).The results indicate that the multivariate PLSR model performed better than the other single variable models with higher NSE%(68.2%vs.67.8%)and less RE%(3.7%vs.-8.7%),while achieving almost the same R MSE%.We also discuss the spectral characteristics of the four selected variables for constructing the multivariate PLSR model and their correlation with the field burn severity data.The new model developed from this study should help to better understand the patterns of forest burn severity and assist in vegetation restoration efforts in the region.展开更多
Near infrared reflectance spectroscopy(NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis(PLS-DA) to discriminate the transgenic(TCTP and mi166) and...Near infrared reflectance spectroscopy(NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis(PLS-DA) to discriminate the transgenic(TCTP and mi166) and wild type(Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene(Os TCTP) and regulation gene(Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000–8 000 cm-1 and 4 000–10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000–10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000–10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.展开更多
To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to...To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to the pH value and levels of Ca2+,NH4+,Na+,K+,Mg2+,SO42-,NO3-,and Cl-in acid rain. We selected vegetables which were sensitive to acid rain as the sample crops,and collected 12 groups of data,of which 8 groups were used for modeling and 4 groups for testing. Using the cross validation method to evaluate the performace of this prediction model indicates that the optimum number of principal components was 3,determined by the minimum of prediction residual error sum of squares,and the prediction error of the regression equation ranges from -2.25% to 4.32%. The model predicted that the economic loss of vegetables from acid rain is negatively corrrelated to pH and the concentrations of NH4+,SO42-,NO3-,and Cl-in the rain,and positively correlated to the concentrations of Ca2+,Na+,K+ and Mg2+. The precision of the model may be improved if the non-linearity of original data is addressed.展开更多
Based on the surveying data of strata-moving angle and the ordinary least squares regression, this paper is to construct, a regression model is constructed which is strata-moving parameter β concerning the coal bed o...Based on the surveying data of strata-moving angle and the ordinary least squares regression, this paper is to construct, a regression model is constructed which is strata-moving parameter β concerning the coal bed obliquity, coal thickness, mining depth, etc. But the regression is unsuccessful. The result is that none of the parameters is suited, this is not up to objective reality. This paper presents a novel method, partial least squares regression (PLS regression), to construct the statistic model of strata-moving parameter β. The experiment shows that the forecasting model is reasonable.展开更多
The water distribution system of one residential district in Tianjin is taken as an example to analyze the changes of water quality.Partial least squares(PLS) regression model,in which the turbidity and Fe are regarde...The water distribution system of one residential district in Tianjin is taken as an example to analyze the changes of water quality.Partial least squares(PLS) regression model,in which the turbidity and Fe are regarded as control objectives,is used to establish the statistical model.The experimental results indicate that the PLS regression model has good predicted results of water quality compared with the monitored data.The percentages of absolute relative error(below 15%,20%,30%) are 44.4%,66.7%,100%(turbidity) and 33.3%,44.4%,77.8%(Fe) on the 4th sampling point;77.8%,88.9%,88.9%(turbidity) and 44.4%,55.6%,66.7%(Fe) on the 5th sampling point.展开更多
The box office during the later Spring Festival shows an attractive prospect.This paper studied the factors affecting total box office during the broad Spring Festival which is from the Spring Festival to the Lantern ...The box office during the later Spring Festival shows an attractive prospect.This paper studied the factors affecting total box office during the broad Spring Festival which is from the Spring Festival to the Lantern Festival.Data of films released during the broad Spring Festival from the years 2016 to 2019 in China were gathered,and the impact of eight explanatory variables on the box office during the broad Spring Festival was empirically analyzed by partial least squares(PLS)regression with software SIMCA.The results suggest that word-of-mouth has the most positive effect on the box office during the broad Spring Festival.Later propaganda has a positive effect,while early promotion has a negative effect on the box office.Director’s influence has a positive effect,while actor’s influence does not contribute much to the box office.Length of the trailer has a negative effect.The film format of 2D or 3D doesn’t contribute much to the box office.展开更多
Based on continuum power regression(CPR) method, a novel derivation of kernel partial least squares(named CPR-KPLS) regression is proposed for approximating arbitrary nonlinear functions.Kernel function is used to map...Based on continuum power regression(CPR) method, a novel derivation of kernel partial least squares(named CPR-KPLS) regression is proposed for approximating arbitrary nonlinear functions.Kernel function is used to map the input variables(input space) into a Reproducing Kernel Hilbert Space(so called feature space),where a linear CPR-PLS is constructed based on the projection of explanatory variables to latent variables(components). The linear CPR-PLS in the high-dimensional feature space corresponds to a nonlinear CPR-KPLS in the original input space. This method offers a novel extension for kernel partial least squares regression(KPLS),and some numerical simulation results are presented to illustrate the feasibility of the proposed method.展开更多
Considering chaotic time series multi-step prediction,multi-step direct prediction model based on partial least squares (PLS) is proposed in this article,where PLS,the method for predicting a set of dependent variable...Considering chaotic time series multi-step prediction,multi-step direct prediction model based on partial least squares (PLS) is proposed in this article,where PLS,the method for predicting a set of dependent variables forming a large set of predictors,is used to model the dynamic evolution between the space points and the corresponding future points.The model can eliminate error accumulation with the common single-step local model algorithm,and refrain from the high multi-collinearity problem in the reconstructed state space with the increase of embedding dimension.Simulation predictions are done on the Mackey-Glass chaotic time series with the model. The satisfying prediction accuracy is obtained and the model efficiency verified.In the experiments,the number of extracted components in PLS is set with cross-validation procedure.展开更多
Weighted total least squares(WTLS)have been regarded as the standard tool for the errors-in-variables(EIV)model in which all the elements in the observation vector and the coefficient matrix are contaminated with rand...Weighted total least squares(WTLS)have been regarded as the standard tool for the errors-in-variables(EIV)model in which all the elements in the observation vector and the coefficient matrix are contaminated with random errors.However,in many geodetic applications,some elements are error-free and some random observations appear repeatedly in different positions in the augmented coefficient matrix.It is called the linear structured EIV(LSEIV)model.Two kinds of methods are proposed for the LSEIV model from functional and stochastic modifications.On the one hand,the functional part of the LSEIV model is modified into the errors-in-observations(EIO)model.On the other hand,the stochastic model is modified by applying the Moore-Penrose inverse of the cofactor matrix.The algorithms are derived through the Lagrange multipliers method and linear approximation.The estimation principles and iterative formula of the parameters are proven to be consistent.The first-order approximate variance-covariance matrix(VCM)of the parameters is also derived.A numerical example is given to compare the performances of our proposed three algorithms with the STLS approach.Afterwards,the least squares(LS),total least squares(TLS)and linear structured weighted total least squares(LSWTLS)solutions are compared and the accuracy evaluation formula is proven to be feasible and effective.Finally,the LSWTLS is applied to the field of deformation analysis,which yields a better result than the traditional LS and TLS estimations.展开更多
One-class classification problem has become a popular problem in many fields, with a wide range of applications in anomaly detection, fault diagnosis, and face recognition. We investigate the one-class classification ...One-class classification problem has become a popular problem in many fields, with a wide range of applications in anomaly detection, fault diagnosis, and face recognition. We investigate the one-class classification problem for second-order tensor data. Traditional vector-based one-class classification methods such as one-class support vector machine (OCSVM) and least squares one-class support vector machine (LSOCSVM) have limitations when tensor is used as input data, so we propose a new tensor one-class classification method, LSOCSTM, which directly uses tensor as input data. On one hand, using tensor as input data not only enables to classify tensor data, but also for vector data, classifying it after high dimensionalizing it into tensor still improves the classification accuracy and overcomes the over-fitting problem. On the other hand, different from one-class support tensor machine (OCSTM), we use squared loss instead of the original loss function so that we solve a series of linear equations instead of quadratic programming problems. Therefore, we use the distance to the hyperplane as a metric for classification, and the proposed method is more accurate and faster compared to existing methods. The experimental results show the high efficiency of the proposed method compared with several state-of-the-art methods.展开更多
In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived ...In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived from two quite different perspectives. Here, settling on the most commonly accepted definition of the MSPE as the expectation of the squared prediction error loss, we provide theoretical expressions for it, valid for any linear model (LM) fitter, be it under random or non random designs. Specializing these MSPE expressions for each of them, we are able to derive closed formulas of the MSPE for some of the most popular LM fitters: Ordinary Least Squares (OLS), with or without a full column rank design matrix;Ordinary and Generalized Ridge regression, the latter embedding smoothing splines fitting. For each of these LM fitters, we then deduce a computable estimate of the MSPE which turns out to coincide with Akaike’s FPE. Using a slight variation, we similarly get a class of MSPE estimates coinciding with the classical GCV formula for those same LM fitters.展开更多
Pseudomonas spp.and Enterobacteriaceae are dominant spoilage bacteria in chicken during cold storage(0°C-4°C).In this study,high resolution spectra in the range of 900-1700 nm were acquired and preprocessed ...Pseudomonas spp.and Enterobacteriaceae are dominant spoilage bacteria in chicken during cold storage(0°C-4°C).In this study,high resolution spectra in the range of 900-1700 nm were acquired and preprocessed using Savitzky-Golay convolution smoothing(SGCS),standard normal variate(SNV)and multiplicative scatter correction(MSC),respectively,and then mined using partial least squares(PLS)algorithm to relate to the total counts of Pseudomonas spp.and Enterobacteriaceae(PEC)of fresh chicken breasts to predict PEC rapidly.The results showed that with full 900-1700 nm range wavelength,MSC-PLS model built with MSC spectra performed better than PLS models with other spectra(RAW-PLS,SGCS-PLS,SNV-PLS),with correlation coefficient(RP)of 0.954,root mean square error of prediction(RMSEP)of 0.396 log10 CFU/g and residual predictive deviation(RPD)of 3.33 in prediction set.Based on the 12 optimal wavelengths(902.2 nm,905.5 nm,923.6 nm,938.4 nm,946.7 nm,1025.7 nm,1124.4 nm,1211.6 nm,1269.2 nm,1653.7 nm,1691.8 nm and 1693.4 nm)selected from MSC spectra by successive projections algorithm(SPA),SPA-MSC-PLS model had RP of 0.954,RMSEP of 0.397 log10 CFU/g and RPD of 3.32,similar to MSC-PLS model.The overall study indicated that NIR spectra combined with PLS algorithm could be used to detect the PEC of chicken flesh in a rapid and non-destructive way.展开更多
A method of multiple outputs least squares support vector regression (LS-SVR) was developed and described in detail, with the radial basis function (RBF) as the kernel function. The method was applied to predict the f...A method of multiple outputs least squares support vector regression (LS-SVR) was developed and described in detail, with the radial basis function (RBF) as the kernel function. The method was applied to predict the future state of the power-shift steering transmission (PSST). A prediction model of PSST was gotten with multiple outputs LS-SVR. The model performance was greatly influenced by the penalty parameter γ and kernel parameterσ2 which were optimized using cross validation method. The training and prediction of the model were done with spectrometric oil analysis data. The predictive and actual values were compared and a fault in the second PSST was found. The research proved that this method had good accuracy in PSST fault prediction, and any possible problem in PSST could be found through a comparative analysis.展开更多
In factor analysis, a factor loading matrix is often rotated to a simple target matrix for its simplicity. For the purpose, Procrustes rotation minimizes the discrepancy between the target and rotated loadings using t...In factor analysis, a factor loading matrix is often rotated to a simple target matrix for its simplicity. For the purpose, Procrustes rotation minimizes the discrepancy between the target and rotated loadings using two types of approximation: 1) approximate the zeros in the target by the non-zeros in the loadings, and 2) approximate the non-zeros in the target by the non-zeros in the loadings. The central issue of Procrustes rotation considered in the article is that it equally treats the two types of approximation, while the former is more important for simplifying the loading matrix. Furthermore, a well-known issue of Simplimax is the computational inefficiency in estimating the sparse target matrix, which yields a considerable number of local minima. The research proposes a new rotation procedure that consists of the following two stages. The first stage estimates sparse target matrix with lesser computational cost by regularization technique. In the second stage, a loading matrix is rotated to the target, emphasizing on the approximation of non-zeros to zeros in the target by least squares criterion with generalized weighing that is newly proposed by the study. The simulation study and real data examples revealed that the proposed method surely simplifies loading matrices.展开更多
To overcome the disadvantage that the standard least squares support vector regression(LS-SVR) algorithm is not suitable to multiple-input multiple-output(MIMO) system modelling directly,an improved LS-SVR algorithm w...To overcome the disadvantage that the standard least squares support vector regression(LS-SVR) algorithm is not suitable to multiple-input multiple-output(MIMO) system modelling directly,an improved LS-SVR algorithm which was defined as multi-output least squares support vector regression(MLSSVR) was put forward by adding samples' absolute errors in objective function and applied to flatness intelligent control.To solve the poor-precision problem of the control scheme based on effective matrix in flatness control,the predictive control was introduced into the control system and the effective matrix-predictive flatness control method was proposed by combining the merits of the two methods.Simulation experiment was conducted on 900HC reversible cold roll.The performance of effective matrix method and the effective matrix-predictive control method were compared,and the results demonstrate the validity of the effective matrix-predictive control method.展开更多
Least squares projection twin support vector machine(LSPTSVM)has faster computing speed than classical least squares support vector machine(LSSVM).However,LSPTSVM is sensitive to outliers and its solution lacks sparsi...Least squares projection twin support vector machine(LSPTSVM)has faster computing speed than classical least squares support vector machine(LSSVM).However,LSPTSVM is sensitive to outliers and its solution lacks sparsity.Therefore,it is difficult for LSPTSVM to process large-scale datasets with outliers.In this paper,we propose a robust LSPTSVM model(called R-LSPTSVM)by applying truncated least squares loss function.The robustness of R-LSPTSVM is proved from a weighted perspective.Furthermore,we obtain the sparse solution of R-LSPTSVM by using the pivoting Cholesky factorization method in primal space.Finally,the sparse R-LSPTSVM algorithm(SR-LSPTSVM)is proposed.Experimental results show that SR-LSPTSVM is insensitive to outliers and can deal with large-scale datasets fastly.展开更多
In classical regression analysis,the error of independent variable is usually not taken into account in regression analysis.This paper presents two solution methods for the case that both the independent and the depen...In classical regression analysis,the error of independent variable is usually not taken into account in regression analysis.This paper presents two solution methods for the case that both the independent and the dependent variables have errors.These methods are derived from the condition-adjustment and indirect-adjustment models based on the Total-Least-Squares principle.The equivalence of these two methods is also proven in theory.展开更多
基金Project supported by the Fundamental Research Funds for the Central Universities, China (Grant No. 2019XD-A02)the National Natural Science Foundation of China (Grant Nos. U1636106, 61671087, 61170272, and 92046001)+2 种基金Natural Science Foundation of Beijing Municipality, China (Grant No. 4182006)Technological Special Project of Guizhou Province, China (Grant No. 20183001)the Foundation of Guizhou Provincial Key Laboratory of Public Big Data (Grant Nos. 2018BDKFJJ016 and 2018BDKFJJ018)。
文摘Partial least squares(PLS) regression is an important linear regression method that efficiently addresses the multiple correlation problem by combining principal component analysis and multiple regression. In this paper, we present a quantum partial least squares(QPLS) regression algorithm. To solve the high time complexity of the PLS regression, we design a quantum eigenvector search method to speed up principal components and regression parameters construction. Meanwhile, we give a density matrix product method to avoid multiple access to quantum random access memory(QRAM)during building residual matrices. The time and space complexities of the QPLS regression are logarithmic in the independent variable dimension n, the dependent variable dimension w, and the number of variables m. This algorithm achieves exponential speed-ups over the PLS regression on n, m, and w. In addition, the QPLS regression inspires us to explore more potential quantum machine learning applications in future works.
基金the National Natural Science Foundation of China (41171281, 40701120)the Beijing Nova Program, China (2008B33)
文摘Estimating wheat grain protein content by remote sensing is important for assessing wheat quality at maturity and making grains harvest and purchase policies. However, spatial variability of soil condition, temperature, and precipitation will affect grain protein contents and these factors usually cannot be monitored accurately by remote sensing data from single image. In this research, the relationships between wheat protein content at maturity and wheat agronomic parameters at different growing stages were analyzed and multi-temporal images of Landsat TM were used to estimate grain protein content by partial least squares regression. Experiment data were acquired in the suburb of Beijing during a 2-yr experiment in the period from 2003 to 2004. Determination coefficient, average deviation of self-modeling, and deviation of cross- validation were employed to assess the estimation accuracy of wheat grain protein content. Their values were 0.88, 1.30%, 3.81% and 0.72, 5.22%, 12.36% for 2003 and 2004, respectively. The research laid an agronomic foundation for GPC (grain protein content) estimation by multi-temporal remote sensing. The results showed that it is feasible to estimate GPC of wheat from multi-temporal remote sensing data in large area.
基金partially supported by the Fundamental Research Funds for the Central Universities(DL12CA12,2572017PZ05)in part by the Research Foundation for Junior Teachers from the Ministry of Education of China(20110062120010)。
文摘Several indices and simple empirical models and ratios of single band from pre-and post-fire Landsat images have been developed to estimate and/or map burn severity.However,these models and indices are usually site-,time-and vegetation-dependent and their applications are limited.The Daxing'an Mountains range has the largest forested area in China and is prone to wildfires.Whether or not the existing models can effectively characterize the burn severity over a large region is unclear.In this study,we used the orthogonal signal correction method based on partial least squares regression(PLSR)to select those variables that better interpret the variance of burn severity.A new index and other commonly used indices were used to construct a new,multivariate PLSR model which was compared with the popular single variable models,according to three assessment indices:relative root mean square error(RMSE%),relative bias(R E%)and Nash–Sutcliffe efficiency(NSE%).The results indicate that the multivariate PLSR model performed better than the other single variable models with higher NSE%(68.2%vs.67.8%)and less RE%(3.7%vs.-8.7%),while achieving almost the same R MSE%.We also discuss the spectral characteristics of the four selected variables for constructing the multivariate PLSR model and their correlation with the field burn severity data.The new model developed from this study should help to better understand the patterns of forest burn severity and assist in vegetation restoration efforts in the region.
基金supported by the projects under the Innovation Team of the Safety Standards and Testing Technology for Agricultural Products of Zhejiang Province, China (Grant No.2010R50028)the National Key Technologies R&D Program of China during the 11th Five-Year Plan Period (Grant No.2006BAK02A18)
文摘Near infrared reflectance spectroscopy(NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis(PLS-DA) to discriminate the transgenic(TCTP and mi166) and wild type(Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene(Os TCTP) and regulation gene(Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000–8 000 cm-1 and 4 000–10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000–10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000–10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.
基金Funded by the Natural Basic Research Program of China under the grant No. 2005CB422207.
文摘To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to the pH value and levels of Ca2+,NH4+,Na+,K+,Mg2+,SO42-,NO3-,and Cl-in acid rain. We selected vegetables which were sensitive to acid rain as the sample crops,and collected 12 groups of data,of which 8 groups were used for modeling and 4 groups for testing. Using the cross validation method to evaluate the performace of this prediction model indicates that the optimum number of principal components was 3,determined by the minimum of prediction residual error sum of squares,and the prediction error of the regression equation ranges from -2.25% to 4.32%. The model predicted that the economic loss of vegetables from acid rain is negatively corrrelated to pH and the concentrations of NH4+,SO42-,NO3-,and Cl-in the rain,and positively correlated to the concentrations of Ca2+,Na+,K+ and Mg2+. The precision of the model may be improved if the non-linearity of original data is addressed.
基金Project(030501801) supported by the Key Laboratory of the State Bureau of Surveying and Mapping in Geographical Space InformationEngineering
文摘Based on the surveying data of strata-moving angle and the ordinary least squares regression, this paper is to construct, a regression model is constructed which is strata-moving parameter β concerning the coal bed obliquity, coal thickness, mining depth, etc. But the regression is unsuccessful. The result is that none of the parameters is suited, this is not up to objective reality. This paper presents a novel method, partial least squares regression (PLS regression), to construct the statistic model of strata-moving parameter β. The experiment shows that the forecasting model is reasonable.
基金Supported by National Natural Science Foundation of China (No.50478086)Tianjin Special Scientific Innovation Foundation (No.06FZZDSH00900)
文摘The water distribution system of one residential district in Tianjin is taken as an example to analyze the changes of water quality.Partial least squares(PLS) regression model,in which the turbidity and Fe are regarded as control objectives,is used to establish the statistical model.The experimental results indicate that the PLS regression model has good predicted results of water quality compared with the monitored data.The percentages of absolute relative error(below 15%,20%,30%) are 44.4%,66.7%,100%(turbidity) and 33.3%,44.4%,77.8%(Fe) on the 4th sampling point;77.8%,88.9%,88.9%(turbidity) and 44.4%,55.6%,66.7%(Fe) on the 5th sampling point.
基金Communication University of China Foundation,China(No.CUC18A015-2)Fundamental Research Funds for the Central Universities,China(No.CUC200D036)
文摘The box office during the later Spring Festival shows an attractive prospect.This paper studied the factors affecting total box office during the broad Spring Festival which is from the Spring Festival to the Lantern Festival.Data of films released during the broad Spring Festival from the years 2016 to 2019 in China were gathered,and the impact of eight explanatory variables on the box office during the broad Spring Festival was empirically analyzed by partial least squares(PLS)regression with software SIMCA.The results suggest that word-of-mouth has the most positive effect on the box office during the broad Spring Festival.Later propaganda has a positive effect,while early promotion has a negative effect on the box office.Director’s influence has a positive effect,while actor’s influence does not contribute much to the box office.Length of the trailer has a negative effect.The film format of 2D or 3D doesn’t contribute much to the box office.
文摘Based on continuum power regression(CPR) method, a novel derivation of kernel partial least squares(named CPR-KPLS) regression is proposed for approximating arbitrary nonlinear functions.Kernel function is used to map the input variables(input space) into a Reproducing Kernel Hilbert Space(so called feature space),where a linear CPR-PLS is constructed based on the projection of explanatory variables to latent variables(components). The linear CPR-PLS in the high-dimensional feature space corresponds to a nonlinear CPR-KPLS in the original input space. This method offers a novel extension for kernel partial least squares regression(KPLS),and some numerical simulation results are presented to illustrate the feasibility of the proposed method.
文摘Considering chaotic time series multi-step prediction,multi-step direct prediction model based on partial least squares (PLS) is proposed in this article,where PLS,the method for predicting a set of dependent variables forming a large set of predictors,is used to model the dynamic evolution between the space points and the corresponding future points.The model can eliminate error accumulation with the common single-step local model algorithm,and refrain from the high multi-collinearity problem in the reconstructed state space with the increase of embedding dimension.Simulation predictions are done on the Mackey-Glass chaotic time series with the model. The satisfying prediction accuracy is obtained and the model efficiency verified.In the experiments,the number of extracted components in PLS is set with cross-validation procedure.
基金the financial support of the National Natural Science Foundation of China(Grant No.42074016,42104025,42274057and 41704007)Hunan Provincial Natural Science Foundation of China(Grant No.2021JJ30244)Scientific Research Fund of Hunan Provincial Education Department(Grant No.22B0496)。
文摘Weighted total least squares(WTLS)have been regarded as the standard tool for the errors-in-variables(EIV)model in which all the elements in the observation vector and the coefficient matrix are contaminated with random errors.However,in many geodetic applications,some elements are error-free and some random observations appear repeatedly in different positions in the augmented coefficient matrix.It is called the linear structured EIV(LSEIV)model.Two kinds of methods are proposed for the LSEIV model from functional and stochastic modifications.On the one hand,the functional part of the LSEIV model is modified into the errors-in-observations(EIO)model.On the other hand,the stochastic model is modified by applying the Moore-Penrose inverse of the cofactor matrix.The algorithms are derived through the Lagrange multipliers method and linear approximation.The estimation principles and iterative formula of the parameters are proven to be consistent.The first-order approximate variance-covariance matrix(VCM)of the parameters is also derived.A numerical example is given to compare the performances of our proposed three algorithms with the STLS approach.Afterwards,the least squares(LS),total least squares(TLS)and linear structured weighted total least squares(LSWTLS)solutions are compared and the accuracy evaluation formula is proven to be feasible and effective.Finally,the LSWTLS is applied to the field of deformation analysis,which yields a better result than the traditional LS and TLS estimations.
文摘One-class classification problem has become a popular problem in many fields, with a wide range of applications in anomaly detection, fault diagnosis, and face recognition. We investigate the one-class classification problem for second-order tensor data. Traditional vector-based one-class classification methods such as one-class support vector machine (OCSVM) and least squares one-class support vector machine (LSOCSVM) have limitations when tensor is used as input data, so we propose a new tensor one-class classification method, LSOCSTM, which directly uses tensor as input data. On one hand, using tensor as input data not only enables to classify tensor data, but also for vector data, classifying it after high dimensionalizing it into tensor still improves the classification accuracy and overcomes the over-fitting problem. On the other hand, different from one-class support tensor machine (OCSTM), we use squared loss instead of the original loss function so that we solve a series of linear equations instead of quadratic programming problems. Therefore, we use the distance to the hyperplane as a metric for classification, and the proposed method is more accurate and faster compared to existing methods. The experimental results show the high efficiency of the proposed method compared with several state-of-the-art methods.
文摘In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived from two quite different perspectives. Here, settling on the most commonly accepted definition of the MSPE as the expectation of the squared prediction error loss, we provide theoretical expressions for it, valid for any linear model (LM) fitter, be it under random or non random designs. Specializing these MSPE expressions for each of them, we are able to derive closed formulas of the MSPE for some of the most popular LM fitters: Ordinary Least Squares (OLS), with or without a full column rank design matrix;Ordinary and Generalized Ridge regression, the latter embedding smoothing splines fitting. For each of these LM fitters, we then deduce a computable estimate of the MSPE which turns out to coincide with Akaike’s FPE. Using a slight variation, we similarly get a class of MSPE estimates coinciding with the classical GCV formula for those same LM fitters.
基金The authors acknowledged that this work was financially supported by Major Scientific and Technological Project of Henan Province(Grant No.161100110600)Key Scientific and Technological Project of Henan Province(No.212102310491,No.182102310060)+3 种基金China Postdoctoral Science Foundation(No.2018M632767)Henan Postdoctoral Science Foundation(No.001801021)Youth Talents Support Project of Henan Province(No.2018HYTP008)and Bainong Outstanding Talents Project of Henan Institute of Science and Technology(No.BNYC2018-2-27).
文摘Pseudomonas spp.and Enterobacteriaceae are dominant spoilage bacteria in chicken during cold storage(0°C-4°C).In this study,high resolution spectra in the range of 900-1700 nm were acquired and preprocessed using Savitzky-Golay convolution smoothing(SGCS),standard normal variate(SNV)and multiplicative scatter correction(MSC),respectively,and then mined using partial least squares(PLS)algorithm to relate to the total counts of Pseudomonas spp.and Enterobacteriaceae(PEC)of fresh chicken breasts to predict PEC rapidly.The results showed that with full 900-1700 nm range wavelength,MSC-PLS model built with MSC spectra performed better than PLS models with other spectra(RAW-PLS,SGCS-PLS,SNV-PLS),with correlation coefficient(RP)of 0.954,root mean square error of prediction(RMSEP)of 0.396 log10 CFU/g and residual predictive deviation(RPD)of 3.33 in prediction set.Based on the 12 optimal wavelengths(902.2 nm,905.5 nm,923.6 nm,938.4 nm,946.7 nm,1025.7 nm,1124.4 nm,1211.6 nm,1269.2 nm,1653.7 nm,1691.8 nm and 1693.4 nm)selected from MSC spectra by successive projections algorithm(SPA),SPA-MSC-PLS model had RP of 0.954,RMSEP of 0.397 log10 CFU/g and RPD of 3.32,similar to MSC-PLS model.The overall study indicated that NIR spectra combined with PLS algorithm could be used to detect the PEC of chicken flesh in a rapid and non-destructive way.
基金Supported by the Ministerial Level Advanced Research Foundation(3031030)the"111"Project(B08043)
文摘A method of multiple outputs least squares support vector regression (LS-SVR) was developed and described in detail, with the radial basis function (RBF) as the kernel function. The method was applied to predict the future state of the power-shift steering transmission (PSST). A prediction model of PSST was gotten with multiple outputs LS-SVR. The model performance was greatly influenced by the penalty parameter γ and kernel parameterσ2 which were optimized using cross validation method. The training and prediction of the model were done with spectrometric oil analysis data. The predictive and actual values were compared and a fault in the second PSST was found. The research proved that this method had good accuracy in PSST fault prediction, and any possible problem in PSST could be found through a comparative analysis.
文摘In factor analysis, a factor loading matrix is often rotated to a simple target matrix for its simplicity. For the purpose, Procrustes rotation minimizes the discrepancy between the target and rotated loadings using two types of approximation: 1) approximate the zeros in the target by the non-zeros in the loadings, and 2) approximate the non-zeros in the target by the non-zeros in the loadings. The central issue of Procrustes rotation considered in the article is that it equally treats the two types of approximation, while the former is more important for simplifying the loading matrix. Furthermore, a well-known issue of Simplimax is the computational inefficiency in estimating the sparse target matrix, which yields a considerable number of local minima. The research proposes a new rotation procedure that consists of the following two stages. The first stage estimates sparse target matrix with lesser computational cost by regularization technique. In the second stage, a loading matrix is rotated to the target, emphasizing on the approximation of non-zeros to zeros in the target by least squares criterion with generalized weighing that is newly proposed by the study. The simulation study and real data examples revealed that the proposed method surely simplifies loading matrices.
基金Project(50675186) supported by the National Natural Science Foundation of China
文摘To overcome the disadvantage that the standard least squares support vector regression(LS-SVR) algorithm is not suitable to multiple-input multiple-output(MIMO) system modelling directly,an improved LS-SVR algorithm which was defined as multi-output least squares support vector regression(MLSSVR) was put forward by adding samples' absolute errors in objective function and applied to flatness intelligent control.To solve the poor-precision problem of the control scheme based on effective matrix in flatness control,the predictive control was introduced into the control system and the effective matrix-predictive flatness control method was proposed by combining the merits of the two methods.Simulation experiment was conducted on 900HC reversible cold roll.The performance of effective matrix method and the effective matrix-predictive control method were compared,and the results demonstrate the validity of the effective matrix-predictive control method.
基金supported by the National Natural Science Foundation of China(6177202062202433+4 种基金621723716227242262036010)the Natural Science Foundation of Henan Province(22100002)the Postdoctoral Research Grant in Henan Province(202103111)。
文摘Least squares projection twin support vector machine(LSPTSVM)has faster computing speed than classical least squares support vector machine(LSSVM).However,LSPTSVM is sensitive to outliers and its solution lacks sparsity.Therefore,it is difficult for LSPTSVM to process large-scale datasets with outliers.In this paper,we propose a robust LSPTSVM model(called R-LSPTSVM)by applying truncated least squares loss function.The robustness of R-LSPTSVM is proved from a weighted perspective.Furthermore,we obtain the sparse solution of R-LSPTSVM by using the pivoting Cholesky factorization method in primal space.Finally,the sparse R-LSPTSVM algorithm(SR-LSPTSVM)is proposed.Experimental results show that SR-LSPTSVM is insensitive to outliers and can deal with large-scale datasets fastly.
基金supported by the National Nature Science Foundation of China (41174009)
文摘In classical regression analysis,the error of independent variable is usually not taken into account in regression analysis.This paper presents two solution methods for the case that both the independent and the dependent variables have errors.These methods are derived from the condition-adjustment and indirect-adjustment models based on the Total-Least-Squares principle.The equivalence of these two methods is also proven in theory.