The relative toxicity of 48 anilines using the Tetrahymena pyriformis population growth characteristics IGC50 (concentration causing 50% growth inhibition), available in the literature, was studied. At first, the en...The relative toxicity of 48 anilines using the Tetrahymena pyriformis population growth characteristics IGC50 (concentration causing 50% growth inhibition), available in the literature, was studied. At first, the entire data set was randomly split into a training set (31 chemicals) used to establish the QSAR model, and a test set (17 chemicals) for statistical external validation. A biparametric model was developed using, as independent variables, 3D theoretical descriptors derived from DRAGON software. The GA-MLR (genetic algorithm variable subset selection) procedure was performed on the trainingset by the software mobydigs using the OLS (ordinary least squares) regression method, and GA(genetic algorithm)-VSS(variable subset selection) by maximising the cross-validated explained variance (Q^2Loo)' The obtained model was examined for robustness (Q^2LOOcross-validation, Y-scrambling) and predictive ability through both internal (Q^2LM0, bootstrap) and external validation (Q^2ext) methods. Descriptors included in the QSAR model indicated that log/GC^-150 value was related to molecular size and shape, and interaction of molecule with its surrounding medium or its target. Moreover, the applicability domain of the model was discussed.展开更多
Soil metaproteomics has excellent potential as a tool to elucidate the structural and functional changes in soil microbial communities in response to environmental alterations. However, soil metaproteomics is hindered...Soil metaproteomics has excellent potential as a tool to elucidate the structural and functional changes in soil microbial communities in response to environmental alterations. However, soil metaproteomics is hindered by several challenges and gaps. Soil microbial communities possess extremely complex microbial composition, including many uncultured microorganisms without whole genome sequencing. Thus, how to select a suitable protein sequence database remains challenging in soil metaproteomics. In this study, the Public database and Meta-database were constructed using protein sequences from public databases and metagenomics, respectively. We comprehensively analyzed and compared the soil metaproteomic results using these two kinds of protein sequence databases for protein identification based on published soil metaproteomic raw data. The results demonstrated that many more proteins, higher sequence coverage, and even more microbial species and functional annotations could be identified using the Meta-database compared with those identified using the Public database. These findings indicated that the Meta-database was more specific as a protein sequence database. However, the follow-up in-depth metaproteomic analyses exhibited similar main results regardless of the database used. The microbial community composition at the genus level was similar between the two databases, especially the species annotations with high peptide-spectrum match and high abundance. The functional analyses in response to stress, such as the gene ontology enrichment of biological progress and molecular function and the key functional microorganisms, were also similar regardless of the database. Our analysis revealed that the Public database could also meet the demand to explore the functional responses of microbial proteins to some extent. This study provides valuable insights into the choice of protein sequence databases and their impacts on subsequent bioinformatic analysis in soil metaproteomic research and will facilitate the optimization of experimental design for different purposes.展开更多
文摘The relative toxicity of 48 anilines using the Tetrahymena pyriformis population growth characteristics IGC50 (concentration causing 50% growth inhibition), available in the literature, was studied. At first, the entire data set was randomly split into a training set (31 chemicals) used to establish the QSAR model, and a test set (17 chemicals) for statistical external validation. A biparametric model was developed using, as independent variables, 3D theoretical descriptors derived from DRAGON software. The GA-MLR (genetic algorithm variable subset selection) procedure was performed on the trainingset by the software mobydigs using the OLS (ordinary least squares) regression method, and GA(genetic algorithm)-VSS(variable subset selection) by maximising the cross-validated explained variance (Q^2Loo)' The obtained model was examined for robustness (Q^2LOOcross-validation, Y-scrambling) and predictive ability through both internal (Q^2LM0, bootstrap) and external validation (Q^2ext) methods. Descriptors included in the QSAR model indicated that log/GC^-150 value was related to molecular size and shape, and interaction of molecule with its surrounding medium or its target. Moreover, the applicability domain of the model was discussed.
基金supported by the National Key Research and Development Program of China(No.2016YFD0200-308)the National Key Basic Research Program of China(No.2015CB150501)the Project of Priority and Key Areas,Institute of Soil Science,Chinese Academy of Sciences(Nos.ISSASIP1605 and ISSASIP1640).
文摘Soil metaproteomics has excellent potential as a tool to elucidate the structural and functional changes in soil microbial communities in response to environmental alterations. However, soil metaproteomics is hindered by several challenges and gaps. Soil microbial communities possess extremely complex microbial composition, including many uncultured microorganisms without whole genome sequencing. Thus, how to select a suitable protein sequence database remains challenging in soil metaproteomics. In this study, the Public database and Meta-database were constructed using protein sequences from public databases and metagenomics, respectively. We comprehensively analyzed and compared the soil metaproteomic results using these two kinds of protein sequence databases for protein identification based on published soil metaproteomic raw data. The results demonstrated that many more proteins, higher sequence coverage, and even more microbial species and functional annotations could be identified using the Meta-database compared with those identified using the Public database. These findings indicated that the Meta-database was more specific as a protein sequence database. However, the follow-up in-depth metaproteomic analyses exhibited similar main results regardless of the database used. The microbial community composition at the genus level was similar between the two databases, especially the species annotations with high peptide-spectrum match and high abundance. The functional analyses in response to stress, such as the gene ontology enrichment of biological progress and molecular function and the key functional microorganisms, were also similar regardless of the database. Our analysis revealed that the Public database could also meet the demand to explore the functional responses of microbial proteins to some extent. This study provides valuable insights into the choice of protein sequence databases and their impacts on subsequent bioinformatic analysis in soil metaproteomic research and will facilitate the optimization of experimental design for different purposes.