Although genome-wide association studies have identified more than eighty genetic variants associated with non-small cell lung cancer(NSCLC)risk,biological mechanisms of these variants remain largely unknown.By integr...Although genome-wide association studies have identified more than eighty genetic variants associated with non-small cell lung cancer(NSCLC)risk,biological mechanisms of these variants remain largely unknown.By integrating a large-scale genotype data of 15581 lung adenocarcinoma(AD)cases,8350 squamous cell carcinoma(SqCC)cases,and 27355 controls,as well as multiple transcriptome and epigenomic databases,we conducted histology-specific meta-analyses and functional annotations of both reported and novel susceptibility variants.We identified 3064 credible risk variants for NSCLC,which were overrepresented in enhancer-like and promoter-like histone modification peaks as well as DNase I hypersensitive sites.Transcription factor enrichment analysis revealed that USF1 was AD-specific while CREB1 was SqCC-specific.Functional annotation and genebased analysis implicated 894 target genes,including 274 specifics for AD and 123 for SqCC,which were overrepresented in somatic driver genes(ER=1.95,P=0.005).Pathway enrichment analysis and Gene-Set Enrichment Analysis revealed that AD genes were primarily involved in immune-related pathways,while SqCC genes were homologous recombination deficiency related.Our results illustrate the molecular basis of both wellstudied and new susceptibility loci of NSCLC,providing not only novel insights into the genetic heterogeneity between AD and SqCC but also a set of plausible gene targets for post-GWAS functional experiments.展开更多
Lentil(Lens culinaris Medik.), a diploid(2n = 14) with a genome size greater than 4000 Mbp, is an important cool season food legume grown worldwide. The availability of genomic resources is limited in this crop specie...Lentil(Lens culinaris Medik.), a diploid(2n = 14) with a genome size greater than 4000 Mbp, is an important cool season food legume grown worldwide. The availability of genomic resources is limited in this crop species. The objective of this study was to develop polymorphic markers in lentil using publicly available curated expressed sequence tag information(ESTs). In this study, 9513 ESTs were downloaded from the National Center for Biotechnology Information(NCBI) database to develop unigene-based simple sequence repeat(SSR) markers. The ESTs were assembled into 4053 unigenes and then analyzed to identify 374 SSRs using the MISA microsatellite identification tool. Among the 374 SSRs, 26 compound SSRs were observed.Primer pairs for these SSRs were designed using Primer3 version 1.14. To classify the functional annotation of ESTs and EST–SSRs, BLASTx searches(using E-value 1 × 10-5) against the public UniP rot(http://www.uniprot.org/) and NCBI(http://www.ncbi.nlh.nih.gov/) databases were performed. Further functional annotation was performed using PLAZA(version3.0) comparative genomics and GO annotation was summarized using the Plant GO slim category. Among the synthesized 312 primers, 219 successfully amplified Lens DNA. A diverse panel of 24 Lens genotypes was used to identify polymorphic markers. A polymorphic set of 57 markers successfully discriminated the test genotypes. This set of polymorphic markers with functional annotation data could be used as molecular tools in lentil breeding.展开更多
Lonicera japonica Thunb.,a traditional Chinese herb,has been used for treating human diseases for thousands of years.Recently,the genome of L.japonica has been decoded,providing valuable information for research into ...Lonicera japonica Thunb.,a traditional Chinese herb,has been used for treating human diseases for thousands of years.Recently,the genome of L.japonica has been decoded,providing valuable information for research into gene function.However,no comprehensive database for gene functional analysis and mining is available for L.japonica.We therefore constructed LjaFGD(www.gzybioinformatics.cn/LjaFGD and bioinformatics.cau.edu.cn/LjaFGD),a database for analyzing and comparing gene function in L.japonica.We constructed a gene co-expression network based on 77 RNA-seq samples,and then annotated genes of L.japonica by alignment against protein sequences from public databases.We also introduced several tools for gene functional analysis,including Blast,motif analysis,gene set enrichment analysis,heatmap analysis,and JBrowse.Our co-expression network revealed that MYB and WRKY transcription factor family genes were co-expressed with genes encoding key enzymes in the biosynthesis of chlorogenic acid and luteolin in L.japonica.We used flavonol synthase 1(LjFLS1)as an example to show the reliability and applicability of our database.LjaFGD and its various associated tools will provide researchers with an accessible platform for retrieving functional information on L.japonica genes to further biological discovery.展开更多
Soil metaproteomics has excellent potential as a tool to elucidate the structural and functional changes in soil microbial communities in response to environmental alterations. However, soil metaproteomics is hindered...Soil metaproteomics has excellent potential as a tool to elucidate the structural and functional changes in soil microbial communities in response to environmental alterations. However, soil metaproteomics is hindered by several challenges and gaps. Soil microbial communities possess extremely complex microbial composition, including many uncultured microorganisms without whole genome sequencing. Thus, how to select a suitable protein sequence database remains challenging in soil metaproteomics. In this study, the Public database and Meta-database were constructed using protein sequences from public databases and metagenomics, respectively. We comprehensively analyzed and compared the soil metaproteomic results using these two kinds of protein sequence databases for protein identification based on published soil metaproteomic raw data. The results demonstrated that many more proteins, higher sequence coverage, and even more microbial species and functional annotations could be identified using the Meta-database compared with those identified using the Public database. These findings indicated that the Meta-database was more specific as a protein sequence database. However, the follow-up in-depth metaproteomic analyses exhibited similar main results regardless of the database used. The microbial community composition at the genus level was similar between the two databases, especially the species annotations with high peptide-spectrum match and high abundance. The functional analyses in response to stress, such as the gene ontology enrichment of biological progress and molecular function and the key functional microorganisms, were also similar regardless of the database. Our analysis revealed that the Public database could also meet the demand to explore the functional responses of microbial proteins to some extent. This study provides valuable insights into the choice of protein sequence databases and their impacts on subsequent bioinformatic analysis in soil metaproteomic research and will facilitate the optimization of experimental design for different purposes.展开更多
As the most pervasive epigenetic marker present on mRNAs and long non-coding RNAs(lncRNAs),N6-methyladenosine(m^(6)A)RNA methylation has been shown to participate in essential biological processes.Recent studies have ...As the most pervasive epigenetic marker present on mRNAs and long non-coding RNAs(lncRNAs),N6-methyladenosine(m^(6)A)RNA methylation has been shown to participate in essential biological processes.Recent studies have revealed the distinct patterns of m^(6)A methylome across human tissues,and a major challenge remains in elucidating the tissue-specific presence and circuitry of m^(6)A methylation.We present here a comprehensive online platform,m^(6)A-TSHub,for unveiling the context-specific m^(6)A methylation and genetic mutations that potentially regulate m^(6)A epigenetic mark.m^(6)A-TSHub consists of four core components,including(1)m^(6)A-TSDB,a comprehensive database of 184,554 functionally annotated m^(6)A sites derived from 23 human tissues and 499,369 m^(6)A sites from 25 tumor conditions,respectively;(2)m^(6)A-TSFinder,a web server for high-accuracy prediction of m^(6)A methylation sites within a specific tissue from RNA sequences,which was constructed using multi-instance deep neural networks with gated attention;(3)m^(6)ATSVar,a web server for assessing the impact of genetic variants on tissue-specific m^(6)A RNA modifications;and(4)m^(6)A-CAVar,a database of 587,983 The Cancer Genome Atlas(TCGA)cancer mutations(derived from 27 cancer types)that were predicted to affect m^(6)A modifications in the primary tissue of cancers.The database should make a useful resource for studying the m^(6)A methylome and the genetic factors of epitranscriptome disturbance in a specific tissue(or cancer type).m^(6)A-TSHub is accessible at www.xjtlu.edu.cn/biologicalsciences/m^(6)ats.展开更多
Mammals have evolved mechanisms to sense hypoxia and induce hypoxic responses.Recently,high-throughput techniques have greatly promoted global studies of protein expression changes during hypoxia and the identificatio...Mammals have evolved mechanisms to sense hypoxia and induce hypoxic responses.Recently,high-throughput techniques have greatly promoted global studies of protein expression changes during hypoxia and the identification of candidate genes associated with hypoxiaadaptive evolution,which have contributed to the understanding of the complex regulatory networks of hypoxia.In this study,we developed an integrated resource for the expression dynamics of proteins in response to hypoxia(iHypoxia),and this database contains 2589 expression events of 1944 proteins identified by low-throughput experiments(LTEs)and 422,553 quantitative expression events of 33,559 proteins identified by high-throughput experiments from five mammals that exhibit a response to hypoxia.Various experimental details,such as the hypoxic experimental conditions,expression patterns,and sample types,were carefully collected and integrated.Furthermore,8788 candidate genes from diverse species inhabiting low-oxygen environments were also integrated.In addition,we conducted an orthologous search and computationally identified 394,141 proteins that may respond to hypoxia among 48 animals.An enrichment analysis of human proteins identified from LTEs shows that these proteins are enriched in certain drug targets and cancer genes.Annotation of known posttranslational modification(PTM)sites in the proteins identified by LTEs reveals that these proteins undergo extensive PTMs,particularly phosphorylation,ubiquitination,and acetylation.iHypoxia provides a convenient and user-friendly method for users to obtain hypoxia-related information of interest.展开更多
Genetic and epigenetic changes after polyploidization events could result in variable gene expression and modified regulatory networks.Here,using large-scale transcriptome data,we constructed co-expression networks fo...Genetic and epigenetic changes after polyploidization events could result in variable gene expression and modified regulatory networks.Here,using large-scale transcriptome data,we constructed co-expression networks for diploid,tetraploid,and hexaploid wheat species,and built a platform for comparing co-expression networks of allohexaploid wheat and its progenitors,named WheatCENet.WheatCENet is a platform for searching and comparing specific functional coexpression networks,as well as identifying the related functions of the genes clustered therein.Functional annotations like pathways,gene families,protein-protein interactions,microRNAs(miRNAs),and several lines of epigenome data are integrated into this platform,and Gene Ontology(GO)annotation,gene set enrichment analysis(GSEA),motif identification,and other useful tools are also included.Using WheatCENet,we found that the network of WHEAT ABERRANT PANICLE ORGANIZATION I(WAPOI)has more co-expressed genes related to spike development in hexaploid wheat than its progenitors.We also found a novel motif of CCWWWWWWGG(CArG)specifically in the promoter region of WAPO-Al,suggesting that neofunctionalization of the WAPO-AI gene affects spikelet development in hexaploid wheat.WheatCENet is useful for investigating co-expression networks and conducting other analyses,and thus facilitates comparative and functional genomic studies in wheat.展开更多
The Malaysian mahseer(Tor tambroides),one of the most valuable freshwater fish in the world,is mainly targeted for human consumption.The mitogenomic data of this species is available to date,but the genomic informatio...The Malaysian mahseer(Tor tambroides),one of the most valuable freshwater fish in the world,is mainly targeted for human consumption.The mitogenomic data of this species is available to date,but the genomic information is still lacking.For the first time,we sequenced the whole genome of an adult fish on both Illumina and Nanopore platforms.The hybrid genome assembly had resulted in a sum of 1.23 Gb genomic sequence from the 44,726 contigs found with 44 kb N50 length and BUSCO genome completeness of 87.6%.Four types of SSRs had been detected and identified within the genome with a greater AT abundance than that of GC.Predicted protein sequences had been functionally annotated to public databases,namely GO,KEGG and COG.A maximum likelihood phylogenomic tree containing 52 Actinopterygii species and one Sarcopterygii species as outgroup was constructed,providing first insights into the genome-based evolutionary relationship of T.tambroides with other ray-finned fish.These data are crucial in facilitating the study of population genomics,species identification,morphological variations,and evolutionary biology,which are helpful in the conservation of this species.展开更多
The development of new biomarkers or therapeutic targets for cancer immunotherapies requires deep understanding of Tcells.To date,the complete landscape and systematic characterization of long noncoding RNAs(lncRNAs)i...The development of new biomarkers or therapeutic targets for cancer immunotherapies requires deep understanding of Tcells.To date,the complete landscape and systematic characterization of long noncoding RNAs(lncRNAs)in T cells in cancer immunity are lacking.Here,by systematically analyzing full-length single-cell RNA sequencing(scRNA-seq)data of more than 20,000 libraries of T cells across three cancer types,we provided the first comprehensive catalog and the functional repertoires of lncRNAs in human T cells.Specifically,we developed a custom pipeline for de novo transcriptome assembly and obtained a novel lncRNA catalog containing 9433 genes.This increased the number of current human lncRNA catalog by 16%and nearly doubled the number of lncRNAs expressed in T cells.We found that a portion of expressed genes in single T cells were lncRNAs which had been overlooked by the majority of previous studies.Based on metacell maps constructed by the MetaCell algorithm that partitions scRNA-seq datasets into disjointed and homogenous groups of cells(metacells),154 signature lncRNA genes were identified.They were associated with effector,exhausted,and regulatory T cell states.Moreover,84 of them were functionally annotated based on the co-expression networks,indicating that lncRNAs might broadly participate in the regulation of T cell functions.Our findings provide a new point of view and resource for investigating the mechanisms of T cell regulation in cancer immunity as well as for novel cancer-immune biomarker development and cancer immunotherapies.展开更多
Genome-wide association studies with an Illumina Bovine50K chip have detected 105 SNPs associated with one or multiple milk production traits in the Chinese Holstein population.Of these,38 significant SNPs detected wi...Genome-wide association studies with an Illumina Bovine50K chip have detected 105 SNPs associated with one or multiple milk production traits in the Chinese Holstein population.Of these,38 significant SNPs detected with high confidence by both L1-TDT and MMRA methods were selected to further mine potential key genes affecting milk yield and milk composition.By blasting the flanking sequences of these 38 SNPs with the bovine genome sequence combined with comparative genomics analysis,26 genes were found to contain or be near to such SNPs.Among them,the C14H8orf33 gene is merely 87 bp away from the significant SNP,Hapmap30383-BTC-005848.Hence,we report herein genotype-phenotype associations to further validate the genetic effects of the C14H8orf33 gene.By pooled DNA sequencing of 14 unrelated Holstein sires,a total of 18 with seven novel SNPs were identified.Among them,nine SNPs were in the 5′regulatory region,one in exon 6 and the other in the 3′UTR and 3′regulatory region.A total of nine of these identified SNPs were successfully genotyped and analyzed by mass spectrometry for association with five milk production traits in an independent resource population.The results showed that these SNPs were statistically significant for more than two traits[P<(0.0001–0.0267)].In addition,mRNA expression analyses revealed that C14H8orf33 was ubiquitous in eight different tissues,with a relatively higher expression level in the mammary gland than in other tissues.These findings,therefore,provide strong evidence for association of C14H8orf33 variants with milk yield and milk composition traits and may be applied in Chinese Holstein breeding programs.展开更多
Gene Ontology(GO)has been widely used to annotate functions of genes and gene products.Here,we proposed a new method,Triplet GO,to deduce GO terms of protein-coding and noncoding genes,through the integration of four ...Gene Ontology(GO)has been widely used to annotate functions of genes and gene products.Here,we proposed a new method,Triplet GO,to deduce GO terms of protein-coding and noncoding genes,through the integration of four complementary pipelines built on transcript expression profile,genetic sequence alignment,protein sequence alignment,and naīve probability.Triplet GO was tested on a large set of 5754 genes from 8 species(human,mouse,Arabidopsis,rat,fly,budding yeast,fission yeast,and nematoda)and 2433 proteins with available expression data from the third Critical Assessment of Protein Function Annotation challenge(CAFA3).Experimental results show that Triplet GO achieves function annotation accuracy significantly beyond the current state-of-the-art approaches.Detailed analyses show that the major advantage of Triplet GO lies in the coupling of a new triplet network-based profiling method with the feature space mapping technique,which can accurately recognize function patterns from transcript expression profiles.Meanwhile,the combination of multiple complementary models,especially those from transcript expression and protein-level alignments,improves the coverage and accuracy of the final GO annotation results.The standalone package and an online server of Triplet GO are freely available at https://zhanggroup.org/Triplet GO/.展开更多
Proteins function as integral actors in essential life processes,rendering the realm of protein research a fundamental domain that possesses the potential to propel advancements in pharmaceuticals and disease investig...Proteins function as integral actors in essential life processes,rendering the realm of protein research a fundamental domain that possesses the potential to propel advancements in pharmaceuticals and disease investigation.Within the context of protein research,an imperious demand arises to uncover protein functionalities and untangle intricate mechanistic underpinnings.Due to the exorbitant costs and limited throughput inherent in experimental investigations,computational models offer a promising alternative to accelerate protein function annotation.In recent years,protein pre-training models have exhibited noteworthy advancement across multiple prediction tasks.This advancement highlights a notable prospect for effectively tackling the intricate downstream task associated with protein function prediction.In this review,we elucidate the historical evolution and research paradigms of computational methods for predicting protein function.Subsequently,we summarize the progress in protein and molecule representation as well as feature extraction techniques.Furthermore,we assess the performance of machine learning-based algorithms across various objectives in protein function prediction,thereby offering a comprehensive perspective on the progress within this field.展开更多
基金the Key International(Regional)Cooperative Research Project(No.81820108028)the National Natural Science Foundation of China(Nos.81521004,81922061,81973123,and 81803306)+2 种基金the Science Foundation for Distinguished Young Scholars of Jiangsu(No.BK20160046)the Priority Academic Program for the Development of Jiangsu Higher Education Institutions(Public Health and Preventive Medicine).the National Cancer Institute,National Institutes of Health of USA through grants U01-CA063673,UM1-CA167462,and U01-CA167462.
文摘Although genome-wide association studies have identified more than eighty genetic variants associated with non-small cell lung cancer(NSCLC)risk,biological mechanisms of these variants remain largely unknown.By integrating a large-scale genotype data of 15581 lung adenocarcinoma(AD)cases,8350 squamous cell carcinoma(SqCC)cases,and 27355 controls,as well as multiple transcriptome and epigenomic databases,we conducted histology-specific meta-analyses and functional annotations of both reported and novel susceptibility variants.We identified 3064 credible risk variants for NSCLC,which were overrepresented in enhancer-like and promoter-like histone modification peaks as well as DNase I hypersensitive sites.Transcription factor enrichment analysis revealed that USF1 was AD-specific while CREB1 was SqCC-specific.Functional annotation and genebased analysis implicated 894 target genes,including 274 specifics for AD and 123 for SqCC,which were overrepresented in somatic driver genes(ER=1.95,P=0.005).Pathway enrichment analysis and Gene-Set Enrichment Analysis revealed that AD genes were primarily involved in immune-related pathways,while SqCC genes were homologous recombination deficiency related.Our results illustrate the molecular basis of both wellstudied and new susceptibility loci of NSCLC,providing not only novel insights into the genetic heterogeneity between AD and SqCC but also a set of plausible gene targets for post-GWAS functional experiments.
基金Financial assistance from ICARDA, Morocco, in the form of a brief projectgrant support from the Northern Pulse Growers Association and the USA Dry Pea and Lentil Council are gratefully acknowledged
文摘Lentil(Lens culinaris Medik.), a diploid(2n = 14) with a genome size greater than 4000 Mbp, is an important cool season food legume grown worldwide. The availability of genomic resources is limited in this crop species. The objective of this study was to develop polymorphic markers in lentil using publicly available curated expressed sequence tag information(ESTs). In this study, 9513 ESTs were downloaded from the National Center for Biotechnology Information(NCBI) database to develop unigene-based simple sequence repeat(SSR) markers. The ESTs were assembled into 4053 unigenes and then analyzed to identify 374 SSRs using the MISA microsatellite identification tool. Among the 374 SSRs, 26 compound SSRs were observed.Primer pairs for these SSRs were designed using Primer3 version 1.14. To classify the functional annotation of ESTs and EST–SSRs, BLASTx searches(using E-value 1 × 10-5) against the public UniP rot(http://www.uniprot.org/) and NCBI(http://www.ncbi.nlh.nih.gov/) databases were performed. Further functional annotation was performed using PLAZA(version3.0) comparative genomics and GO annotation was summarized using the Plant GO slim category. Among the synthesized 312 primers, 219 successfully amplified Lens DNA. A diverse panel of 24 Lens genotypes was used to identify polymorphic markers. A polymorphic set of 57 markers successfully discriminated the test genotypes. This set of polymorphic markers with functional annotation data could be used as molecular tools in lentil breeding.
基金This work was supported by the Ph.D.Startup Foundation of Guizhou University of Traditional Chinese Medicine(no.(2020)32 and no.(2019)141)National Natural Science Foundation of China(no.31970629).
文摘Lonicera japonica Thunb.,a traditional Chinese herb,has been used for treating human diseases for thousands of years.Recently,the genome of L.japonica has been decoded,providing valuable information for research into gene function.However,no comprehensive database for gene functional analysis and mining is available for L.japonica.We therefore constructed LjaFGD(www.gzybioinformatics.cn/LjaFGD and bioinformatics.cau.edu.cn/LjaFGD),a database for analyzing and comparing gene function in L.japonica.We constructed a gene co-expression network based on 77 RNA-seq samples,and then annotated genes of L.japonica by alignment against protein sequences from public databases.We also introduced several tools for gene functional analysis,including Blast,motif analysis,gene set enrichment analysis,heatmap analysis,and JBrowse.Our co-expression network revealed that MYB and WRKY transcription factor family genes were co-expressed with genes encoding key enzymes in the biosynthesis of chlorogenic acid and luteolin in L.japonica.We used flavonol synthase 1(LjFLS1)as an example to show the reliability and applicability of our database.LjaFGD and its various associated tools will provide researchers with an accessible platform for retrieving functional information on L.japonica genes to further biological discovery.
基金supported by the National Key Research and Development Program of China(No.2016YFD0200-308)the National Key Basic Research Program of China(No.2015CB150501)the Project of Priority and Key Areas,Institute of Soil Science,Chinese Academy of Sciences(Nos.ISSASIP1605 and ISSASIP1640).
文摘Soil metaproteomics has excellent potential as a tool to elucidate the structural and functional changes in soil microbial communities in response to environmental alterations. However, soil metaproteomics is hindered by several challenges and gaps. Soil microbial communities possess extremely complex microbial composition, including many uncultured microorganisms without whole genome sequencing. Thus, how to select a suitable protein sequence database remains challenging in soil metaproteomics. In this study, the Public database and Meta-database were constructed using protein sequences from public databases and metagenomics, respectively. We comprehensively analyzed and compared the soil metaproteomic results using these two kinds of protein sequence databases for protein identification based on published soil metaproteomic raw data. The results demonstrated that many more proteins, higher sequence coverage, and even more microbial species and functional annotations could be identified using the Meta-database compared with those identified using the Public database. These findings indicated that the Meta-database was more specific as a protein sequence database. However, the follow-up in-depth metaproteomic analyses exhibited similar main results regardless of the database used. The microbial community composition at the genus level was similar between the two databases, especially the species annotations with high peptide-spectrum match and high abundance. The functional analyses in response to stress, such as the gene ontology enrichment of biological progress and molecular function and the key functional microorganisms, were also similar regardless of the database. Our analysis revealed that the Public database could also meet the demand to explore the functional responses of microbial proteins to some extent. This study provides valuable insights into the choice of protein sequence databases and their impacts on subsequent bioinformatic analysis in soil metaproteomic research and will facilitate the optimization of experimental design for different purposes.
基金supported by the National Natural Science Foundation of China(Grant Nos.32100519 and 31671373)the Scientific Research Foundation for Advanced Talents of Fujian Medical University(Grant No.XRCZX2021019)the XJTLU Key Program Special Fund(Grant Nos.KSF-T-01,KSF-E-51,and KSF-P-02),China.
文摘As the most pervasive epigenetic marker present on mRNAs and long non-coding RNAs(lncRNAs),N6-methyladenosine(m^(6)A)RNA methylation has been shown to participate in essential biological processes.Recent studies have revealed the distinct patterns of m^(6)A methylome across human tissues,and a major challenge remains in elucidating the tissue-specific presence and circuitry of m^(6)A methylation.We present here a comprehensive online platform,m^(6)A-TSHub,for unveiling the context-specific m^(6)A methylation and genetic mutations that potentially regulate m^(6)A epigenetic mark.m^(6)A-TSHub consists of four core components,including(1)m^(6)A-TSDB,a comprehensive database of 184,554 functionally annotated m^(6)A sites derived from 23 human tissues and 499,369 m^(6)A sites from 25 tumor conditions,respectively;(2)m^(6)A-TSFinder,a web server for high-accuracy prediction of m^(6)A methylation sites within a specific tissue from RNA sequences,which was constructed using multi-instance deep neural networks with gated attention;(3)m^(6)ATSVar,a web server for assessing the impact of genetic variants on tissue-specific m^(6)A RNA modifications;and(4)m^(6)A-CAVar,a database of 587,983 The Cancer Genome Atlas(TCGA)cancer mutations(derived from 27 cancer types)that were predicted to affect m^(6)A modifications in the primary tissue of cancers.The database should make a useful resource for studying the m^(6)A methylome and the genetic factors of epitranscriptome disturbance in a specific tissue(or cancer type).m^(6)A-TSHub is accessible at www.xjtlu.edu.cn/biologicalsciences/m^(6)ats.
基金supported by grants from the National Key R&D Program of China(Grant No.2021YFA1302100 to Ze-Xian Liu)the National Natural Science Foundation of China(Grant No.U2004152 to Zhenlong Wang,Grant Nos.81972239 and 91953123 to Ze-Xian Liu)+2 种基金the Fostering Fund of Fundamental Research for Young Teachers of Zhengzhou University,China(Grant No.JC21343016 to Han Cheng)the Program for Guangdong Introducing Innovative and Entrepreneurial Teams,China(Grant No.2017ZT07S096 to Ze-Xian Liu)the Tip-Top Scientific and Technical Innovative Youth Talents of Guangdong Special Support Program,China(Grant No.2019TQ05Y351 to Ze-Xian Liu).
文摘Mammals have evolved mechanisms to sense hypoxia and induce hypoxic responses.Recently,high-throughput techniques have greatly promoted global studies of protein expression changes during hypoxia and the identification of candidate genes associated with hypoxiaadaptive evolution,which have contributed to the understanding of the complex regulatory networks of hypoxia.In this study,we developed an integrated resource for the expression dynamics of proteins in response to hypoxia(iHypoxia),and this database contains 2589 expression events of 1944 proteins identified by low-throughput experiments(LTEs)and 422,553 quantitative expression events of 33,559 proteins identified by high-throughput experiments from five mammals that exhibit a response to hypoxia.Various experimental details,such as the hypoxic experimental conditions,expression patterns,and sample types,were carefully collected and integrated.Furthermore,8788 candidate genes from diverse species inhabiting low-oxygen environments were also integrated.In addition,we conducted an orthologous search and computationally identified 394,141 proteins that may respond to hypoxia among 48 animals.An enrichment analysis of human proteins identified from LTEs shows that these proteins are enriched in certain drug targets and cancer genes.Annotation of known posttranslational modification(PTM)sites in the proteins identified by LTEs reveals that these proteins undergo extensive PTMs,particularly phosphorylation,ubiquitination,and acetylation.iHypoxia provides a convenient and user-friendly method for users to obtain hypoxia-related information of interest.
基金supported by grants from the National Natural Science Foundation of China(Grant Nos.31970629 and 31771467 to ZS,and 31870209 to YJ).
文摘Genetic and epigenetic changes after polyploidization events could result in variable gene expression and modified regulatory networks.Here,using large-scale transcriptome data,we constructed co-expression networks for diploid,tetraploid,and hexaploid wheat species,and built a platform for comparing co-expression networks of allohexaploid wheat and its progenitors,named WheatCENet.WheatCENet is a platform for searching and comparing specific functional coexpression networks,as well as identifying the related functions of the genes clustered therein.Functional annotations like pathways,gene families,protein-protein interactions,microRNAs(miRNAs),and several lines of epigenome data are integrated into this platform,and Gene Ontology(GO)annotation,gene set enrichment analysis(GSEA),motif identification,and other useful tools are also included.Using WheatCENet,we found that the network of WHEAT ABERRANT PANICLE ORGANIZATION I(WAPOI)has more co-expressed genes related to spike development in hexaploid wheat than its progenitors.We also found a novel motif of CCWWWWWWGG(CArG)specifically in the promoter region of WAPO-Al,suggesting that neofunctionalization of the WAPO-AI gene affects spikelet development in hexaploid wheat.WheatCENet is useful for investigating co-expression networks and conducting other analyses,and thus facilitates comparative and functional genomic studies in wheat.
基金This work was fully funded by Sarawak Research and Development Council through the Research Initiation Grant Scheme with grant number RDCRG/RIF/2019/13 awarded to H.H.Chung.
文摘The Malaysian mahseer(Tor tambroides),one of the most valuable freshwater fish in the world,is mainly targeted for human consumption.The mitogenomic data of this species is available to date,but the genomic information is still lacking.For the first time,we sequenced the whole genome of an adult fish on both Illumina and Nanopore platforms.The hybrid genome assembly had resulted in a sum of 1.23 Gb genomic sequence from the 44,726 contigs found with 44 kb N50 length and BUSCO genome completeness of 87.6%.Four types of SSRs had been detected and identified within the genome with a greater AT abundance than that of GC.Predicted protein sequences had been functionally annotated to public databases,namely GO,KEGG and COG.A maximum likelihood phylogenomic tree containing 52 Actinopterygii species and one Sarcopterygii species as outgroup was constructed,providing first insights into the genome-based evolutionary relationship of T.tambroides with other ray-finned fish.These data are crucial in facilitating the study of population genomics,species identification,morphological variations,and evolutionary biology,which are helpful in the conservation of this species.
基金This work was supported by the Science and Technology Project of Shenzhen,China(Grant Nos.JCYJ20190807145013281,JHZ20170310090257380,JCYJ20170413092711058,and JCYJ20170307095822325)the China Postdoctoral Science Foundation(Grant No.2019M663369)the National Natural Science Foundation of China(Grant No.31970636).
文摘The development of new biomarkers or therapeutic targets for cancer immunotherapies requires deep understanding of Tcells.To date,the complete landscape and systematic characterization of long noncoding RNAs(lncRNAs)in T cells in cancer immunity are lacking.Here,by systematically analyzing full-length single-cell RNA sequencing(scRNA-seq)data of more than 20,000 libraries of T cells across three cancer types,we provided the first comprehensive catalog and the functional repertoires of lncRNAs in human T cells.Specifically,we developed a custom pipeline for de novo transcriptome assembly and obtained a novel lncRNA catalog containing 9433 genes.This increased the number of current human lncRNA catalog by 16%and nearly doubled the number of lncRNAs expressed in T cells.We found that a portion of expressed genes in single T cells were lncRNAs which had been overlooked by the majority of previous studies.Based on metacell maps constructed by the MetaCell algorithm that partitions scRNA-seq datasets into disjointed and homogenous groups of cells(metacells),154 signature lncRNA genes were identified.They were associated with effector,exhausted,and regulatory T cell states.Moreover,84 of them were functionally annotated based on the co-expression networks,indicating that lncRNAs might broadly participate in the regulation of T cell functions.Our findings provide a new point of view and resource for investigating the mechanisms of T cell regulation in cancer immunity as well as for novel cancer-immune biomarker development and cancer immunotherapies.
基金supported by the National High-tech R&D Program of China(2013AA102504)the National Key Technology R&D Program(2011BAD28B02)+3 种基金the National Transgenic Major Project(2014ZX08009-053B)the Beijing Innovation Team of Technology System in the National Dairy Industry,the Beijing Research and Technology Program(D121100003312001)the earmarked fund for Modern Agro-industry Technology Research System(CARS-37)the Program for Changjiang Scholar and Innovation Research Team in University(IRT1191).
文摘Genome-wide association studies with an Illumina Bovine50K chip have detected 105 SNPs associated with one or multiple milk production traits in the Chinese Holstein population.Of these,38 significant SNPs detected with high confidence by both L1-TDT and MMRA methods were selected to further mine potential key genes affecting milk yield and milk composition.By blasting the flanking sequences of these 38 SNPs with the bovine genome sequence combined with comparative genomics analysis,26 genes were found to contain or be near to such SNPs.Among them,the C14H8orf33 gene is merely 87 bp away from the significant SNP,Hapmap30383-BTC-005848.Hence,we report herein genotype-phenotype associations to further validate the genetic effects of the C14H8orf33 gene.By pooled DNA sequencing of 14 unrelated Holstein sires,a total of 18 with seven novel SNPs were identified.Among them,nine SNPs were in the 5′regulatory region,one in exon 6 and the other in the 3′UTR and 3′regulatory region.A total of nine of these identified SNPs were successfully genotyped and analyzed by mass spectrometry for association with five milk production traits in an independent resource population.The results showed that these SNPs were statistically significant for more than two traits[P<(0.0001–0.0267)].In addition,mRNA expression analyses revealed that C14H8orf33 was ubiquitous in eight different tissues,with a relatively higher expression level in the mammary gland than in other tissues.These findings,therefore,provide strong evidence for association of C14H8orf33 variants with milk yield and milk composition traits and may be applied in Chinese Holstein breeding programs.
基金supported in part by the National Natural Science Foundation of China(Grant Nos.62072243 and 61772273 to Dong-Jun Yu)the Natural Science Foundation of Jiangsu,China(Grant No.BK20201304 to Dong-Jun Yu)+7 种基金the Foundation of National Defense Key Laboratory of Science and Technology,China(Grant No.JZX7Y202001SY000901 to DongJun Yu)the China Scholarship Council(Grant No.201906840041 to Yi-Heng Zhu)the National Institute of Environmental Health Sciences,USA(Grant No.P30ES017885 to Gilbert S.Omenn)the National Cancer Institute,USA(Grant No.U24CA210967 to Gilbert S.Omenn)the National Institute of General Medical Sciences,USA(Grant Nos.GM136422 and S10OD026825 to Yang Zhang)the National Institute of Allergy and Infectious Diseases,USA(Grant No.AI134678 to Peter L.Freddolino and Yang Zhang)the National Science Foundation,USA(Grant Nos.IIS1901191,DBI2030790,and MTM2025426 to Yang Zhang)used the Extreme Science and Engineering Discovery Environment(XSEDE),which is supported by the National Science Foundation,USA(Grant No.ACI1548562)。
文摘Gene Ontology(GO)has been widely used to annotate functions of genes and gene products.Here,we proposed a new method,Triplet GO,to deduce GO terms of protein-coding and noncoding genes,through the integration of four complementary pipelines built on transcript expression profile,genetic sequence alignment,protein sequence alignment,and naīve probability.Triplet GO was tested on a large set of 5754 genes from 8 species(human,mouse,Arabidopsis,rat,fly,budding yeast,fission yeast,and nematoda)and 2433 proteins with available expression data from the third Critical Assessment of Protein Function Annotation challenge(CAFA3).Experimental results show that Triplet GO achieves function annotation accuracy significantly beyond the current state-of-the-art approaches.Detailed analyses show that the major advantage of Triplet GO lies in the coupling of a new triplet network-based profiling method with the feature space mapping technique,which can accurately recognize function patterns from transcript expression profiles.Meanwhile,the combination of multiple complementary models,especially those from transcript expression and protein-level alignments,improves the coverage and accuracy of the final GO annotation results.The standalone package and an online server of Triplet GO are freely available at https://zhanggroup.org/Triplet GO/.
基金supported in part by the National Natural Science Foundation of China(22033001)the National Key R&D Program of China(2022YFA1303700)the Chinese Academy of Medical Sciences(2021-I2M-5-014).
文摘Proteins function as integral actors in essential life processes,rendering the realm of protein research a fundamental domain that possesses the potential to propel advancements in pharmaceuticals and disease investigation.Within the context of protein research,an imperious demand arises to uncover protein functionalities and untangle intricate mechanistic underpinnings.Due to the exorbitant costs and limited throughput inherent in experimental investigations,computational models offer a promising alternative to accelerate protein function annotation.In recent years,protein pre-training models have exhibited noteworthy advancement across multiple prediction tasks.This advancement highlights a notable prospect for effectively tackling the intricate downstream task associated with protein function prediction.In this review,we elucidate the historical evolution and research paradigms of computational methods for predicting protein function.Subsequently,we summarize the progress in protein and molecule representation as well as feature extraction techniques.Furthermore,we assess the performance of machine learning-based algorithms across various objectives in protein function prediction,thereby offering a comprehensive perspective on the progress within this field.