摘要
【目的】深入研究云南金花茶优良性状、基因功能和基因分布特征,可为云南金花茶乃至山茶属植物功能基因的探索以及该物种的遗传保护提供基础生物信息学理论参考。【方法】采用Illumina Hi Seq 2000技术,对云南金花茶叶片进行转录组测序及相关数据分析,在对测序数据整理、de novo组装后,获得Unigene序列,继而利用相关生物信息学数据库进行序列比对。【结果】组装后,共获得95979条Unigenes,其中N50(拼接转录本不小于总长50%的长度)为1660 nt,平均长度为1124 nt,Q20和Q30序列(处理后质量高于20和30的碱基)分别占96.39%和91.28%。经比对,在Nr、Nt、Swiss-Prot中能得到注释的Unigene分别为58830、43623、44315条。在GO中,有41905条(43.66%)Unigenes注释224129个GO功能,将其分为3个大类和56亚类,所占比例最多的为生物过程这一类功能。在KOG中,有23499条(24.48%)Unigenes注释26430个KOG功能信息,将其分为26个基因功能大类,其中表达量较高的分别是一般功能基因和翻译后修饰、蛋白质转化、伴侣功能的相关基因。另外,共有23214条(24.18%)Unigenes在KEGG中得到注释,根据其涉及的相关通路可将其归为19个亚类,其中以代谢通路较为丰富。此外,利用相关数据库和ESTScan软件对云南金花茶转录组Unigene进行CDS比对和预测,共预测到26428条CDS,其长度集中在100~500 nt,占总CDS的82.10%。【结论】云南金花茶所含基因信息丰富,利用分析得到的所有注释信息可以更深层次地探索其基因组信息和基因分布情况。
【Objective】The traits,gene function,and gene distribution of Camellia fasciculation were further analyzed to provide the basic bioinformatics for the explore of functional genes and heredity conservation of C.fasciculation and Camellia.【Method】Illumina Hi Seq 2000 technology was used to sequence transcriptome and analyze related data of C.fasciculation leaf,after sorting out the sequencing data and de novo assembling,and the unigene sequences was obtained.The alignment of sequences were carried out using relational bioinformatics database.【Results】95979 unigenes were obtained,with N50(the splice transcript no less than 50%of the total length)of 1660 nt.The average of sequences length was 1124 nt,and the total of Q20 and Q30(after treatment,the base group mass was higher than 20 and 30)sequences were 96.39%and 91.28%,respectively.And 58830,43623 and 44315 unigenes had significantly similarity among Nr,Nt and Swiss-Prot,respectively.In the GO database,a total of 41905(43.66%)unigenes were annotated,and among of these genes,224129 function genes weredivided into three major categories and 56 subeategories.In the KOG analysis,there were 23499(24.48%)unigenes were annoted 26430 KOG functions,and were divided into 26 gene function categories,these genes were classified into general function,post-translational modification,protein transformation,and chaperones function.In the KEGG analysis,a total of 23214(24.18%)unigenes were annotated.According to the related pathways involved,they could be classified into 19 subclasses,among of these classes,the metabolic pathways were more abundant than others.In addition,the ESTScan software were used to predict the CDS of C.fasciculation transcription,a total of 26428 CDS were predicted,and the size of CDS were mainly between 100 and 500 nt,which accounted for 82.10%of the total CDS.【Conclusion】C.fasciculationhas rich genes,use all the annotation information obtained from the analysis can be used to explore its genome information and gene distribution.
作者
辛静
李斌
叶鹏
刘成
唐军荣
张贵良
辛培尧
XIN Jing;LI Bin;YE Peng;LIU Cheng;TANG Junrong;ZHANG Guiliang;XIN Peiyao(Key Laboratory for Forest Resources Conservation and Utilization in the Southwest Mountains of China,Ministry of Education,Southwest Forestry University,Kunming 650224,Yunnan,China;Key Laboratory of National Forestry and Grassland Administration on Biodiversity Conservation in Southwest China,Southwest Forestry University,Kunming 650224,Yunnan,China;Bijie Region Forestry Science Research Institute,Bijie 551700,Guizhou,China;Hekou Branch of Administration Bureau of Daweishan National Nature Reserve,Hekou 661399,Yunnan,China)
出处
《经济林研究》
北大核心
2020年第3期85-94,共10页
Non-wood Forest Research
基金
云南省林学一级学科建设项目
云南省林业厅国家公园试点建设项目(2136299)。
关键词
云南金花茶
转录组
UNIGENE
功能
注释
Camellia fascicularis H.T.Chang
transcriptome
unigenes
function
annotation