An approach for batch processes monitoring and fault detection based on multiway kernel partial least squares(MKPLS) was presented.It is known that conventional batch process monitoring methods,such as multiway partia...An approach for batch processes monitoring and fault detection based on multiway kernel partial least squares(MKPLS) was presented.It is known that conventional batch process monitoring methods,such as multiway partial least squares(MPLS),are not suitable due to their intrinsic linearity when the variations are nonlinear.To address this issue,kernel partial least squares(KPLS) was used to capture the nonlinear relationship between the latent structures and predictive variables.In addition,KPLS requires only linear algebra and does not involve any nonlinear optimization.In this paper,the application of KPLS was extended to on-line monitoring of batch processes.The proposed batch monitoring method was applied to a simulation benchmark of fed-batch penicillin fermentation process.And the results demonstrate the superior monitoring performance of MKPLS in comparison to MPLS monitoring.展开更多
Near-infrared (NIR) spectroscopy was applied to reagent-free quantitative analysis of polysaccharide of a brand product of proprietary Chinese medicine (PCM) oral solution samples. A novel method, called absorbance up...Near-infrared (NIR) spectroscopy was applied to reagent-free quantitative analysis of polysaccharide of a brand product of proprietary Chinese medicine (PCM) oral solution samples. A novel method, called absorbance upper optimization partial least squares (AUO-PLS), was proposed and successfully applied to the wavelength selection. Based on varied partitioning of the calibration and prediction sample sets, the parameter optimization was performed to achieve stability. On the basis of the AUO-PLS method, the selected upper bound of appropriate absorbance was 1.53 and the corresponding wavebands combination was 400 - 1880 & 2088 - 2346 nm. With the use of random validation samples excluded from the modeling process, the root-mean-square error and correlation coefficient of prediction for polysaccharide were 27.09 mg·L<sup>-</sup><sup>1</sup> and 0.888, respectively. The results indicate that the NIR prediction values are close to those of the measured values. NIR spectroscopy combined with AUO-PLS method provided a promising tool for quantification of the polysaccharide for PCM oral solution and this technique is rapid and simple when compared with conventional methods.展开更多
Partial least squares(PLS) regression is an important linear regression method that efficiently addresses the multiple correlation problem by combining principal component analysis and multiple regression. In this pap...Partial least squares(PLS) regression is an important linear regression method that efficiently addresses the multiple correlation problem by combining principal component analysis and multiple regression. In this paper, we present a quantum partial least squares(QPLS) regression algorithm. To solve the high time complexity of the PLS regression, we design a quantum eigenvector search method to speed up principal components and regression parameters construction. Meanwhile, we give a density matrix product method to avoid multiple access to quantum random access memory(QRAM)during building residual matrices. The time and space complexities of the QPLS regression are logarithmic in the independent variable dimension n, the dependent variable dimension w, and the number of variables m. This algorithm achieves exponential speed-ups over the PLS regression on n, m, and w. In addition, the QPLS regression inspires us to explore more potential quantum machine learning applications in future works.展开更多
Many complex traits are highly correlated rather than independent. By taking the correlation structure of multiple traits into account, joint association analyses can achieve both higher statistical power and more acc...Many complex traits are highly correlated rather than independent. By taking the correlation structure of multiple traits into account, joint association analyses can achieve both higher statistical power and more accurate estimation. To develop a statistical approach to joint association analysis that includes allele detection and genetic effect estimation, we combined multivariate partial least squares regression with variable selection strategies and selected the optimal model using the Bayesian Information Criterion(BIC). We then performed extensive simulations under varying heritabilities and sample sizes to compare the performance achieved using our method with those obtained by single-trait multilocus methods. Joint association analysis has measurable advantages over single-trait methods, as it exhibits superior gene detection power, especially for pleiotropic genes. Sample size, heritability,polymorphic information content(PIC), and magnitude of gene effects influence the statistical power, accuracy and precision of effect estimation by the joint association analysis.展开更多
The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for t...The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for the liquor brands with the same flavor and the same alcohol content is essential. However, it is also difficult because the components of such liquor samples are very similar. Near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was applied to identification of liquor brands with the same flavor and alcohol content. A total of 160 samples of Luzhou Laojiao liquor and 200 samples of non-Luzhou Laojiao liquor with the same flavor and alcohol content were used for identification. Samples of each type were randomly divided into the modeling and validation sets. The modeling samples were further divided into calibration and prediction sets using the Kennard-Stone algorithm to achieve uniformity and representativeness. In the modeling and validation processes based on PLS-DA method, the recognition rates of samples achieved 99.1% and 98.7%, respectively. The results show high prediction performance for the identification of liquor brands, and were obviously better than those obtained from the principal component linear discriminant analysis method. NIR spectroscopy combined with the PLS-DA method provides a quick and effective means of the discriminant analysis of liquor brands, and is also a promising tool for large-scale inspection of liquor food safety.展开更多
Estimating wheat grain protein content by remote sensing is important for assessing wheat quality at maturity and making grains harvest and purchase policies. However, spatial variability of soil condition, temperatur...Estimating wheat grain protein content by remote sensing is important for assessing wheat quality at maturity and making grains harvest and purchase policies. However, spatial variability of soil condition, temperature, and precipitation will affect grain protein contents and these factors usually cannot be monitored accurately by remote sensing data from single image. In this research, the relationships between wheat protein content at maturity and wheat agronomic parameters at different growing stages were analyzed and multi-temporal images of Landsat TM were used to estimate grain protein content by partial least squares regression. Experiment data were acquired in the suburb of Beijing during a 2-yr experiment in the period from 2003 to 2004. Determination coefficient, average deviation of self-modeling, and deviation of cross- validation were employed to assess the estimation accuracy of wheat grain protein content. Their values were 0.88, 1.30%, 3.81% and 0.72, 5.22%, 12.36% for 2003 and 2004, respectively. The research laid an agronomic foundation for GPC (grain protein content) estimation by multi-temporal remote sensing. The results showed that it is feasible to estimate GPC of wheat from multi-temporal remote sensing data in large area.展开更多
Considering chaotic time series multi-step prediction,multi-step direct prediction model based on partial least squares (PLS) is proposed in this article,where PLS,the method for predicting a set of dependent variable...Considering chaotic time series multi-step prediction,multi-step direct prediction model based on partial least squares (PLS) is proposed in this article,where PLS,the method for predicting a set of dependent variables forming a large set of predictors,is used to model the dynamic evolution between the space points and the corresponding future points.The model can eliminate error accumulation with the common single-step local model algorithm,and refrain from the high multi-collinearity problem in the reconstructed state space with the increase of embedding dimension.Simulation predictions are done on the Mackey-Glass chaotic time series with the model. The satisfying prediction accuracy is obtained and the model efficiency verified.In the experiments,the number of extracted components in PLS is set with cross-validation procedure.展开更多
Near infrared reflectance spectroscopy(NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis(PLS-DA) to discriminate the transgenic(TCTP and mi166) and...Near infrared reflectance spectroscopy(NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis(PLS-DA) to discriminate the transgenic(TCTP and mi166) and wild type(Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene(Os TCTP) and regulation gene(Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000–8 000 cm-1 and 4 000–10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000–10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000–10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.展开更多
To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to...To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to the pH value and levels of Ca2+,NH4+,Na+,K+,Mg2+,SO42-,NO3-,and Cl-in acid rain. We selected vegetables which were sensitive to acid rain as the sample crops,and collected 12 groups of data,of which 8 groups were used for modeling and 4 groups for testing. Using the cross validation method to evaluate the performace of this prediction model indicates that the optimum number of principal components was 3,determined by the minimum of prediction residual error sum of squares,and the prediction error of the regression equation ranges from -2.25% to 4.32%. The model predicted that the economic loss of vegetables from acid rain is negatively corrrelated to pH and the concentrations of NH4+,SO42-,NO3-,and Cl-in the rain,and positively correlated to the concentrations of Ca2+,Na+,K+ and Mg2+. The precision of the model may be improved if the non-linearity of original data is addressed.展开更多
A quantitative structure-activity relationships (QSAR) study is suggested for the prediction of solubility of some thiazolidine-4- carboxylic acid derivatives in aqueous solution. Ab initio theory was used to calculat...A quantitative structure-activity relationships (QSAR) study is suggested for the prediction of solubility of some thiazolidine-4- carboxylic acid derivatives in aqueous solution. Ab initio theory was used to calculate some quantum chemical descriptors including electrostatic potentials and local charges at each atom, HOMO and LUMO energies, etc. Modeling of the solubility of thiazolidine- 4-carboxylic acid derivatives as a function of molecular structures was established by means of the partial least squares (PLS). The subset of descriptors, which resulted in the low prediction error, was selected by genetic algorithm. This model was applied for the prediction of the solubility of some thiazolidine-4-carboxylic acid derivatives, which were not in the modeling procedure. The relative errors of prediction lower that 4% was obtained by using GA-PLS method. The resulted model showed high prediction ability with RMSEP of 3.8836 and 2.9500 for PLS and GA-PLS models, respectively.展开更多
The water distribution system of one residential district in Tianjin is taken as an example to analyze the changes of water quality.Partial least squares(PLS) regression model,in which the turbidity and Fe are regarde...The water distribution system of one residential district in Tianjin is taken as an example to analyze the changes of water quality.Partial least squares(PLS) regression model,in which the turbidity and Fe are regarded as control objectives,is used to establish the statistical model.The experimental results indicate that the PLS regression model has good predicted results of water quality compared with the monitored data.The percentages of absolute relative error(below 15%,20%,30%) are 44.4%,66.7%,100%(turbidity) and 33.3%,44.4%,77.8%(Fe) on the 4th sampling point;77.8%,88.9%,88.9%(turbidity) and 44.4%,55.6%,66.7%(Fe) on the 5th sampling point.展开更多
With recent advances in biotechnology, genome-wide association study(GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical s...With recent advances in biotechnology, genome-wide association study(GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression(LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression(PC-LR), partial least squares-based logistic regression(PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor?mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism(SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis.On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.展开更多
The box office during the later Spring Festival shows an attractive prospect.This paper studied the factors affecting total box office during the broad Spring Festival which is from the Spring Festival to the Lantern ...The box office during the later Spring Festival shows an attractive prospect.This paper studied the factors affecting total box office during the broad Spring Festival which is from the Spring Festival to the Lantern Festival.Data of films released during the broad Spring Festival from the years 2016 to 2019 in China were gathered,and the impact of eight explanatory variables on the box office during the broad Spring Festival was empirically analyzed by partial least squares(PLS)regression with software SIMCA.The results suggest that word-of-mouth has the most positive effect on the box office during the broad Spring Festival.Later propaganda has a positive effect,while early promotion has a negative effect on the box office.Director’s influence has a positive effect,while actor’s influence does not contribute much to the box office.Length of the trailer has a negative effect.The film format of 2D or 3D doesn’t contribute much to the box office.展开更多
Human cadaver dissection remains a core and preferred method of anatomical instruction at most low- and middle-income health professional training institutions. Dissection, which is both traumatic and stressful, sets ...Human cadaver dissection remains a core and preferred method of anatomical instruction at most low- and middle-income health professional training institutions. Dissection, which is both traumatic and stressful, sets the tone of the students’ responses to later and or similar stressful learning opportunities like the post-mortems or care for terminally ill patients. Partial least squares structural equation modelling was used to determine the effect of the students’: personality, perception of the learning environment, learning approach, and effect of the environment on the student, on undergraduate health professional student’s activity in the human cadaver dissection room. This was a secondary analysis of previously collected data from a cross sectional survey of undergraduate health professional students. We found that personality type and perception of the environment had a positive effect on dissection room activity. Approach to learning and being affected by the dissection room experience (impact), had a negative effect on dissection room activity. All the above effects on dissection room activity were not significant. This study showed that personality, perception of the learning environment, learning approach and effect of the environment on the student, had effects on undergraduate health professional student’s activity in the human cadaver dissection room. The modelled effects are opportunities for educational interventions aimed at increasing student activity in the dissection room.展开更多
Based on the surveying data of strata-moving angle and the ordinary least squares regression, this paper is to construct, a regression model is constructed which is strata-moving parameter β concerning the coal bed o...Based on the surveying data of strata-moving angle and the ordinary least squares regression, this paper is to construct, a regression model is constructed which is strata-moving parameter β concerning the coal bed obliquity, coal thickness, mining depth, etc. But the regression is unsuccessful. The result is that none of the parameters is suited, this is not up to objective reality. This paper presents a novel method, partial least squares regression (PLS regression), to construct the statistic model of strata-moving parameter β. The experiment shows that the forecasting model is reasonable.展开更多
Near-infrared spectroscopy coupled with kernel partial least squares-discriminant analysis was used to rapidly screen water containing malathion. In the wavenumber of 4348 cm-1 to 9091 cm-1, the overall correct classi...Near-infrared spectroscopy coupled with kernel partial least squares-discriminant analysis was used to rapidly screen water containing malathion. In the wavenumber of 4348 cm-1 to 9091 cm-1, the overall correct classification rate of kernel partial least squares-discriminant analysis was 100% for training set, and 100% for test set, with the lowest concentration detected malathion residues in water being 1 μg·ml-1. Kernel partial least squares-discriminant analysis was able to have a good performance in classifying data in nonlinear systems. It was inferred that Near-infrared spectroscopy coupled with the kernel partial least squares-discriminant analysis had a potential in rapid screening other pesticide residues in water.展开更多
Based on continuum power regression(CPR) method, a novel derivation of kernel partial least squares(named CPR-KPLS) regression is proposed for approximating arbitrary nonlinear functions.Kernel function is used to map...Based on continuum power regression(CPR) method, a novel derivation of kernel partial least squares(named CPR-KPLS) regression is proposed for approximating arbitrary nonlinear functions.Kernel function is used to map the input variables(input space) into a Reproducing Kernel Hilbert Space(so called feature space),where a linear CPR-PLS is constructed based on the projection of explanatory variables to latent variables(components). The linear CPR-PLS in the high-dimensional feature space corresponds to a nonlinear CPR-KPLS in the original input space. This method offers a novel extension for kernel partial least squares regression(KPLS),and some numerical simulation results are presented to illustrate the feasibility of the proposed method.展开更多
基金National Natural Science Foundation of China (No. 61074079)Shanghai Leading Academic Discipline Project,China (No.B504)
文摘An approach for batch processes monitoring and fault detection based on multiway kernel partial least squares(MKPLS) was presented.It is known that conventional batch process monitoring methods,such as multiway partial least squares(MPLS),are not suitable due to their intrinsic linearity when the variations are nonlinear.To address this issue,kernel partial least squares(KPLS) was used to capture the nonlinear relationship between the latent structures and predictive variables.In addition,KPLS requires only linear algebra and does not involve any nonlinear optimization.In this paper,the application of KPLS was extended to on-line monitoring of batch processes.The proposed batch monitoring method was applied to a simulation benchmark of fed-batch penicillin fermentation process.And the results demonstrate the superior monitoring performance of MKPLS in comparison to MPLS monitoring.
文摘Near-infrared (NIR) spectroscopy was applied to reagent-free quantitative analysis of polysaccharide of a brand product of proprietary Chinese medicine (PCM) oral solution samples. A novel method, called absorbance upper optimization partial least squares (AUO-PLS), was proposed and successfully applied to the wavelength selection. Based on varied partitioning of the calibration and prediction sample sets, the parameter optimization was performed to achieve stability. On the basis of the AUO-PLS method, the selected upper bound of appropriate absorbance was 1.53 and the corresponding wavebands combination was 400 - 1880 & 2088 - 2346 nm. With the use of random validation samples excluded from the modeling process, the root-mean-square error and correlation coefficient of prediction for polysaccharide were 27.09 mg·L<sup>-</sup><sup>1</sup> and 0.888, respectively. The results indicate that the NIR prediction values are close to those of the measured values. NIR spectroscopy combined with AUO-PLS method provided a promising tool for quantification of the polysaccharide for PCM oral solution and this technique is rapid and simple when compared with conventional methods.
基金Project supported by the Fundamental Research Funds for the Central Universities, China (Grant No. 2019XD-A02)the National Natural Science Foundation of China (Grant Nos. U1636106, 61671087, 61170272, and 92046001)+2 种基金Natural Science Foundation of Beijing Municipality, China (Grant No. 4182006)Technological Special Project of Guizhou Province, China (Grant No. 20183001)the Foundation of Guizhou Provincial Key Laboratory of Public Big Data (Grant Nos. 2018BDKFJJ016 and 2018BDKFJJ018)。
文摘Partial least squares(PLS) regression is an important linear regression method that efficiently addresses the multiple correlation problem by combining principal component analysis and multiple regression. In this paper, we present a quantum partial least squares(QPLS) regression algorithm. To solve the high time complexity of the PLS regression, we design a quantum eigenvector search method to speed up principal components and regression parameters construction. Meanwhile, we give a density matrix product method to avoid multiple access to quantum random access memory(QRAM)during building residual matrices. The time and space complexities of the QPLS regression are logarithmic in the independent variable dimension n, the dependent variable dimension w, and the number of variables m. This algorithm achieves exponential speed-ups over the PLS regression on n, m, and w. In addition, the QPLS regression inspires us to explore more potential quantum machine learning applications in future works.
基金supported by grants from the National Program on the Development of Basic Research (2011CB100100)the Priority Academic Program Development of Jiangsu Higher Education Institutions, the National Natural Science Foundations (31391632, 31200943, 31171187, and 91535103)+3 种基金the National High-tech R&D Program (863 Program) (2014AA10A601-5)the Natural Science Foundations of Jiangsu Province (BK20150010)the Natural Science Foundation of the Jiangsu Higher Education Institutions (14KJA210005)the Innovative Research Team of Universities in Jiangsu Province (KYLX_1352)
文摘Many complex traits are highly correlated rather than independent. By taking the correlation structure of multiple traits into account, joint association analyses can achieve both higher statistical power and more accurate estimation. To develop a statistical approach to joint association analysis that includes allele detection and genetic effect estimation, we combined multivariate partial least squares regression with variable selection strategies and selected the optimal model using the Bayesian Information Criterion(BIC). We then performed extensive simulations under varying heritabilities and sample sizes to compare the performance achieved using our method with those obtained by single-trait multilocus methods. Joint association analysis has measurable advantages over single-trait methods, as it exhibits superior gene detection power, especially for pleiotropic genes. Sample size, heritability,polymorphic information content(PIC), and magnitude of gene effects influence the statistical power, accuracy and precision of effect estimation by the joint association analysis.
文摘The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for the liquor brands with the same flavor and the same alcohol content is essential. However, it is also difficult because the components of such liquor samples are very similar. Near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was applied to identification of liquor brands with the same flavor and alcohol content. A total of 160 samples of Luzhou Laojiao liquor and 200 samples of non-Luzhou Laojiao liquor with the same flavor and alcohol content were used for identification. Samples of each type were randomly divided into the modeling and validation sets. The modeling samples were further divided into calibration and prediction sets using the Kennard-Stone algorithm to achieve uniformity and representativeness. In the modeling and validation processes based on PLS-DA method, the recognition rates of samples achieved 99.1% and 98.7%, respectively. The results show high prediction performance for the identification of liquor brands, and were obviously better than those obtained from the principal component linear discriminant analysis method. NIR spectroscopy combined with the PLS-DA method provides a quick and effective means of the discriminant analysis of liquor brands, and is also a promising tool for large-scale inspection of liquor food safety.
基金the National Natural Science Foundation of China (41171281, 40701120)the Beijing Nova Program, China (2008B33)
文摘Estimating wheat grain protein content by remote sensing is important for assessing wheat quality at maturity and making grains harvest and purchase policies. However, spatial variability of soil condition, temperature, and precipitation will affect grain protein contents and these factors usually cannot be monitored accurately by remote sensing data from single image. In this research, the relationships between wheat protein content at maturity and wheat agronomic parameters at different growing stages were analyzed and multi-temporal images of Landsat TM were used to estimate grain protein content by partial least squares regression. Experiment data were acquired in the suburb of Beijing during a 2-yr experiment in the period from 2003 to 2004. Determination coefficient, average deviation of self-modeling, and deviation of cross- validation were employed to assess the estimation accuracy of wheat grain protein content. Their values were 0.88, 1.30%, 3.81% and 0.72, 5.22%, 12.36% for 2003 and 2004, respectively. The research laid an agronomic foundation for GPC (grain protein content) estimation by multi-temporal remote sensing. The results showed that it is feasible to estimate GPC of wheat from multi-temporal remote sensing data in large area.
文摘Considering chaotic time series multi-step prediction,multi-step direct prediction model based on partial least squares (PLS) is proposed in this article,where PLS,the method for predicting a set of dependent variables forming a large set of predictors,is used to model the dynamic evolution between the space points and the corresponding future points.The model can eliminate error accumulation with the common single-step local model algorithm,and refrain from the high multi-collinearity problem in the reconstructed state space with the increase of embedding dimension.Simulation predictions are done on the Mackey-Glass chaotic time series with the model. The satisfying prediction accuracy is obtained and the model efficiency verified.In the experiments,the number of extracted components in PLS is set with cross-validation procedure.
基金supported by the projects under the Innovation Team of the Safety Standards and Testing Technology for Agricultural Products of Zhejiang Province, China (Grant No.2010R50028)the National Key Technologies R&D Program of China during the 11th Five-Year Plan Period (Grant No.2006BAK02A18)
文摘Near infrared reflectance spectroscopy(NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis(PLS-DA) to discriminate the transgenic(TCTP and mi166) and wild type(Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene(Os TCTP) and regulation gene(Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000–8 000 cm-1 and 4 000–10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000–10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000–10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.
基金Funded by the Natural Basic Research Program of China under the grant No. 2005CB422207.
文摘To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to the pH value and levels of Ca2+,NH4+,Na+,K+,Mg2+,SO42-,NO3-,and Cl-in acid rain. We selected vegetables which were sensitive to acid rain as the sample crops,and collected 12 groups of data,of which 8 groups were used for modeling and 4 groups for testing. Using the cross validation method to evaluate the performace of this prediction model indicates that the optimum number of principal components was 3,determined by the minimum of prediction residual error sum of squares,and the prediction error of the regression equation ranges from -2.25% to 4.32%. The model predicted that the economic loss of vegetables from acid rain is negatively corrrelated to pH and the concentrations of NH4+,SO42-,NO3-,and Cl-in the rain,and positively correlated to the concentrations of Ca2+,Na+,K+ and Mg2+. The precision of the model may be improved if the non-linearity of original data is addressed.
文摘A quantitative structure-activity relationships (QSAR) study is suggested for the prediction of solubility of some thiazolidine-4- carboxylic acid derivatives in aqueous solution. Ab initio theory was used to calculate some quantum chemical descriptors including electrostatic potentials and local charges at each atom, HOMO and LUMO energies, etc. Modeling of the solubility of thiazolidine- 4-carboxylic acid derivatives as a function of molecular structures was established by means of the partial least squares (PLS). The subset of descriptors, which resulted in the low prediction error, was selected by genetic algorithm. This model was applied for the prediction of the solubility of some thiazolidine-4-carboxylic acid derivatives, which were not in the modeling procedure. The relative errors of prediction lower that 4% was obtained by using GA-PLS method. The resulted model showed high prediction ability with RMSEP of 3.8836 and 2.9500 for PLS and GA-PLS models, respectively.
基金Supported by National Natural Science Foundation of China (No.50478086)Tianjin Special Scientific Innovation Foundation (No.06FZZDSH00900)
文摘The water distribution system of one residential district in Tianjin is taken as an example to analyze the changes of water quality.Partial least squares(PLS) regression model,in which the turbidity and Fe are regarded as control objectives,is used to establish the statistical model.The experimental results indicate that the PLS regression model has good predicted results of water quality compared with the monitored data.The percentages of absolute relative error(below 15%,20%,30%) are 44.4%,66.7%,100%(turbidity) and 33.3%,44.4%,77.8%(Fe) on the 4th sampling point;77.8%,88.9%,88.9%(turbidity) and 44.4%,55.6%,66.7%(Fe) on the 5th sampling point.
基金founded by the National Natural Science Foundation of China(81202283,81473070,81373102 and81202267)Key Grant of Natural Science Foundation of the Jiangsu Higher Education Institutions of China(10KJA330034 and11KJA330001)+1 种基金the Research Fund for the Doctoral Program of Higher Education of China(20113234110002)the Priority Academic Program for the Development of Jiangsu Higher Education Institutions(Public Health and Preventive Medicine)
文摘With recent advances in biotechnology, genome-wide association study(GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression(LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression(PC-LR), partial least squares-based logistic regression(PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor?mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism(SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis.On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.
基金Communication University of China Foundation,China(No.CUC18A015-2)Fundamental Research Funds for the Central Universities,China(No.CUC200D036)
文摘The box office during the later Spring Festival shows an attractive prospect.This paper studied the factors affecting total box office during the broad Spring Festival which is from the Spring Festival to the Lantern Festival.Data of films released during the broad Spring Festival from the years 2016 to 2019 in China were gathered,and the impact of eight explanatory variables on the box office during the broad Spring Festival was empirically analyzed by partial least squares(PLS)regression with software SIMCA.The results suggest that word-of-mouth has the most positive effect on the box office during the broad Spring Festival.Later propaganda has a positive effect,while early promotion has a negative effect on the box office.Director’s influence has a positive effect,while actor’s influence does not contribute much to the box office.Length of the trailer has a negative effect.The film format of 2D or 3D doesn’t contribute much to the box office.
文摘Human cadaver dissection remains a core and preferred method of anatomical instruction at most low- and middle-income health professional training institutions. Dissection, which is both traumatic and stressful, sets the tone of the students’ responses to later and or similar stressful learning opportunities like the post-mortems or care for terminally ill patients. Partial least squares structural equation modelling was used to determine the effect of the students’: personality, perception of the learning environment, learning approach, and effect of the environment on the student, on undergraduate health professional student’s activity in the human cadaver dissection room. This was a secondary analysis of previously collected data from a cross sectional survey of undergraduate health professional students. We found that personality type and perception of the environment had a positive effect on dissection room activity. Approach to learning and being affected by the dissection room experience (impact), had a negative effect on dissection room activity. All the above effects on dissection room activity were not significant. This study showed that personality, perception of the learning environment, learning approach and effect of the environment on the student, had effects on undergraduate health professional student’s activity in the human cadaver dissection room. The modelled effects are opportunities for educational interventions aimed at increasing student activity in the dissection room.
基金Project(030501801) supported by the Key Laboratory of the State Bureau of Surveying and Mapping in Geographical Space InformationEngineering
文摘Based on the surveying data of strata-moving angle and the ordinary least squares regression, this paper is to construct, a regression model is constructed which is strata-moving parameter β concerning the coal bed obliquity, coal thickness, mining depth, etc. But the regression is unsuccessful. The result is that none of the parameters is suited, this is not up to objective reality. This paper presents a novel method, partial least squares regression (PLS regression), to construct the statistic model of strata-moving parameter β. The experiment shows that the forecasting model is reasonable.
文摘Near-infrared spectroscopy coupled with kernel partial least squares-discriminant analysis was used to rapidly screen water containing malathion. In the wavenumber of 4348 cm-1 to 9091 cm-1, the overall correct classification rate of kernel partial least squares-discriminant analysis was 100% for training set, and 100% for test set, with the lowest concentration detected malathion residues in water being 1 μg·ml-1. Kernel partial least squares-discriminant analysis was able to have a good performance in classifying data in nonlinear systems. It was inferred that Near-infrared spectroscopy coupled with the kernel partial least squares-discriminant analysis had a potential in rapid screening other pesticide residues in water.
文摘Based on continuum power regression(CPR) method, a novel derivation of kernel partial least squares(named CPR-KPLS) regression is proposed for approximating arbitrary nonlinear functions.Kernel function is used to map the input variables(input space) into a Reproducing Kernel Hilbert Space(so called feature space),where a linear CPR-PLS is constructed based on the projection of explanatory variables to latent variables(components). The linear CPR-PLS in the high-dimensional feature space corresponds to a nonlinear CPR-KPLS in the original input space. This method offers a novel extension for kernel partial least squares regression(KPLS),and some numerical simulation results are presented to illustrate the feasibility of the proposed method.