目的探讨orthogonal projection to latent structures(OPLS)方法的原理、特点及其在代谢组学高维数据分析中的应用。方法通过R语言编程实现OPLS方法,利用模拟试验探索OPLS的特性及适用条件,并通过实际数据进行验证。结果利用一个OPLS...目的探讨orthogonal projection to latent structures(OPLS)方法的原理、特点及其在代谢组学高维数据分析中的应用。方法通过R语言编程实现OPLS方法,利用模拟试验探索OPLS的特性及适用条件,并通过实际数据进行验证。结果利用一个OPLS预测主成分的模型拟合效果与利用偏最小二乘(PLS)多个主成分的模型拟合效果相同,同时具有较好的判别能力,其得分图的可视化效果优于PLS。结论 OPLS能够有效去除自变量矩阵X中与因变量Y无关的信息,使模型变得简单、易于解释,同时具有较好的可视化效果,可有效地用于代谢组学数据分析中。展开更多
The identification of timber properties is important for safe application.Near Infrared Spectroscopy(NIRS)technology is widely-used because of its simplicity,efficiency,and positive environmental attributes.However,in...The identification of timber properties is important for safe application.Near Infrared Spectroscopy(NIRS)technology is widely-used because of its simplicity,efficiency,and positive environmental attributes.However,in its application,weak signals are extracted from complex,overlapping and changing information.This study focused on the stability of NIR modeling.The Orthogonal Partial Least Squares(OPLS)and Successive Projections Algorithm(SPA)eliminates noise and extracts effective spectra,and an ensemble learning method MIX-PLS,is applied to establish the model.The elastic modulus of timber is taken as an example,and 201 wood samples of three species,Xylosmacongesta(Lour.)Merr.,Acer pictum subsp.mono,and Betula pendula,samples were divided into three groups to investigate modelling performance.The results show that OPLS can preprocess the near-infrared spectroscopy information according to the target object in the face of the system error and reduce errors to minimum.SPA finally selects 13 spectral bands,simplifies the NIR spectral data and improves model accuracy.The Pearson's correlation coefficient of Calibration(Rc)and the Pearson's correlation coefficient of Prediction(Rp)of Mix Partial Least Squares(MIX-PLS)were 0.95 and 0.90,and Root Mean Square Error of Calibration(RMSEC)and Root Mean Square Error of Prediction(RMSEP)are 2.075 and 6.001,respectively,which shows the model has good generalization abilities.展开更多
目的:建立一种手持式近红外光谱技术与正交投影偏最小二乘法(orthogonal projection partial least squares,OPLS)相结合的快速检测姜黄丸原料混合物中姜黄素含量的方法。方法:取姜黄、炒蒺藜饮片,分别粉碎过筛,按照不同比例混合均匀,制...目的:建立一种手持式近红外光谱技术与正交投影偏最小二乘法(orthogonal projection partial least squares,OPLS)相结合的快速检测姜黄丸原料混合物中姜黄素含量的方法。方法:取姜黄、炒蒺藜饮片,分别粉碎过筛,按照不同比例混合均匀,制备26个样本;采用HPLC方法测定样品中姜黄素含量,以测定值为标签值(Y),手持式近红外光谱仪分别采集每批样品的近红外光谱数据5次取平均值作为近红外基础数据(X,n=26);利用SIMCA 14.1软件进行光谱数据的预处理,利用k折交叉验证方法建立姜黄丸的正交投影偏最小二乘法(OPLS)定量校正模型Y=M(X)。结果:预测Y值与实测Y值OPLS模型回归方程的决定系数R2为0.9893,RMSECV为0.0751;采用Permutation(置换检验)对模型进行内部验证,结果发现模型没有出现过拟合,模型预测效果良好。结论:利用手持式近红外光谱仪可以实现对姜黄丸原料混合物的快速、无损检测。展开更多
PCA(Principal Component Analysis)常用于Biolog ECO和DGGE数据分析,但是该方法无法正确区分不同环境微生物的多样性结构,也无法实现微生物标记的发现。为实现该功能,研究采用PCA、PLS-DA(partial least squares-discriiminate analys...PCA(Principal Component Analysis)常用于Biolog ECO和DGGE数据分析,但是该方法无法正确区分不同环境微生物的多样性结构,也无法实现微生物标记的发现。为实现该功能,研究采用PCA、PLS-DA(partial least squares-discriiminate analysis)、PLS-EDA(partial least squares-discriiminate enhance analysis)及PLS(partial least squares)、OPLS(orthogonal to partial least squares)方法对Biolog ECO和DGGE数据进行分析。结果表明:DGGE数据通过PLS-EDA分析方法能区分不同环境微生物多样性的结构(PC1=16.8%);采用PLS-DA分析方法,发现两个环境样品中有1个样品重合(PC1=33%);PCA分析方法分离效果最差(PC1=27.1%)。Biolog ECO数据通过PLS-EDA分析方法能区分不同环境微生物多样性的结构(PC1=25.5%);PLS-DA分析方法有4个样品重合(PC1=36.3%);PCA分析方法效果最差(PC1=35.1%)。Biolog ECO和DGGE数据进行PLS和OPLS分析方法筛选后,发现多个潜在的碳源及物种,可作为不同环境条件下微生物标记物。可见,PLS-EDA优于PLS-DA及PCA,是微生物研究的重要方法;PLS和OPLS分析方法中VIP(variable important value)≥1.00的条带和碳源可作为潜在的微生物标记。图7,表1,参24。展开更多
应用近红外光谱分析技术,建立了不同品牌不同种类不同批次的乳粉原样和混合样的蛋白质定量分析模型。采用正交投影偏最小二乘法(orthogonal partial least squares,OPLS)建立近红外光谱回归模型,并与其他预处理方法和传统偏最小二乘法(p...应用近红外光谱分析技术,建立了不同品牌不同种类不同批次的乳粉原样和混合样的蛋白质定量分析模型。采用正交投影偏最小二乘法(orthogonal partial least squares,OPLS)建立近红外光谱回归模型,并与其他预处理方法和传统偏最小二乘法(partial least squares,PLS)对比;采用交叉验证法(cross-validation)全局寻优方式获得OPLS和PLS模型的最佳参数;5个主成分建立的OPLS校正模型效果最佳,相关系数R为0.994 0,校正集交叉验证均方根RMSECV为1.09,预测集的化学值与模型预测值的相关系数R达到0.976 7,分析模型的预测误差均方根RMSEP为0.905。结果表明:OPLS回归方法在简化模型的同时提高了模型的预测泛化性能,能够快速无损建立乳粉的蛋白质近红外定量模型。展开更多
基金supported financially by the China State Forestry Administration“948”projects(2015-4-52)Heilongjiang Natural Science Foundation(C2017005)。
文摘The identification of timber properties is important for safe application.Near Infrared Spectroscopy(NIRS)technology is widely-used because of its simplicity,efficiency,and positive environmental attributes.However,in its application,weak signals are extracted from complex,overlapping and changing information.This study focused on the stability of NIR modeling.The Orthogonal Partial Least Squares(OPLS)and Successive Projections Algorithm(SPA)eliminates noise and extracts effective spectra,and an ensemble learning method MIX-PLS,is applied to establish the model.The elastic modulus of timber is taken as an example,and 201 wood samples of three species,Xylosmacongesta(Lour.)Merr.,Acer pictum subsp.mono,and Betula pendula,samples were divided into three groups to investigate modelling performance.The results show that OPLS can preprocess the near-infrared spectroscopy information according to the target object in the face of the system error and reduce errors to minimum.SPA finally selects 13 spectral bands,simplifies the NIR spectral data and improves model accuracy.The Pearson's correlation coefficient of Calibration(Rc)and the Pearson's correlation coefficient of Prediction(Rp)of Mix Partial Least Squares(MIX-PLS)were 0.95 and 0.90,and Root Mean Square Error of Calibration(RMSEC)and Root Mean Square Error of Prediction(RMSEP)are 2.075 and 6.001,respectively,which shows the model has good generalization abilities.
文摘PCA(Principal Component Analysis)常用于Biolog ECO和DGGE数据分析,但是该方法无法正确区分不同环境微生物的多样性结构,也无法实现微生物标记的发现。为实现该功能,研究采用PCA、PLS-DA(partial least squares-discriiminate analysis)、PLS-EDA(partial least squares-discriiminate enhance analysis)及PLS(partial least squares)、OPLS(orthogonal to partial least squares)方法对Biolog ECO和DGGE数据进行分析。结果表明:DGGE数据通过PLS-EDA分析方法能区分不同环境微生物多样性的结构(PC1=16.8%);采用PLS-DA分析方法,发现两个环境样品中有1个样品重合(PC1=33%);PCA分析方法分离效果最差(PC1=27.1%)。Biolog ECO数据通过PLS-EDA分析方法能区分不同环境微生物多样性的结构(PC1=25.5%);PLS-DA分析方法有4个样品重合(PC1=36.3%);PCA分析方法效果最差(PC1=35.1%)。Biolog ECO和DGGE数据进行PLS和OPLS分析方法筛选后,发现多个潜在的碳源及物种,可作为不同环境条件下微生物标记物。可见,PLS-EDA优于PLS-DA及PCA,是微生物研究的重要方法;PLS和OPLS分析方法中VIP(variable important value)≥1.00的条带和碳源可作为潜在的微生物标记。图7,表1,参24。
文摘应用近红外光谱分析技术,建立了不同品牌不同种类不同批次的乳粉原样和混合样的蛋白质定量分析模型。采用正交投影偏最小二乘法(orthogonal partial least squares,OPLS)建立近红外光谱回归模型,并与其他预处理方法和传统偏最小二乘法(partial least squares,PLS)对比;采用交叉验证法(cross-validation)全局寻优方式获得OPLS和PLS模型的最佳参数;5个主成分建立的OPLS校正模型效果最佳,相关系数R为0.994 0,校正集交叉验证均方根RMSECV为1.09,预测集的化学值与模型预测值的相关系数R达到0.976 7,分析模型的预测误差均方根RMSEP为0.905。结果表明:OPLS回归方法在简化模型的同时提高了模型的预测泛化性能,能够快速无损建立乳粉的蛋白质近红外定量模型。