改进的RNA-Seq数据转录组表达分析研究被引量：3

Improved Trancriptome Expression Analysis for RNA-Seq Data

下载PDF

导出

摘要基于高通量测序的RNA-Seq(RNA-sequencing)是用于转录组研究的一种新技术,针对该技术在转录组表达分析研究中存在的读段多源映射和读段非均匀分布等难点,提出一个改进的转录组表达研究方法 LDASeqII(Improvement of latent Dirichlet allocation for sequencing data)。模型利用剪接异构体结构信息对参数进行约束并进行外显子读段数目归一化处理,解决了读段非均匀分布下的多源映射问题。通过引入"伪外显子"和"伪转录本"分别处理接合区读段和噪声读段。将模型应用到真实数据集上,并与原LDASeq(Latent Dirichlet allocation for sequencing data)模型和目前流行的Cufflinks与RSEM(RNA-Seq by expectation maximization)方法进行对比。结果显示,改进方法获得了更为准确的转录本及基因表达水平计算结果。 RNA-Seq（RNA-sequencing）,based on high-throughput sequencing,is a new technique for transcriptome research.Considering the difficulties in the analysis of transcript expression using RNA-Seq data,an improved method,improvement of latent dirichlet allocation for sequencing data（LDASeqⅡ）is proposed to calculate the transcript expression.To deal with multi-mappings between reads and isoforms and non-uniform distribution of reads along reference,LDASeqⅡ utilizes the known gene-isoform annotation to constrain the hyperparameters and normalizes the read counts by exon length for each individual exon.By introducing″pseudo-exon″and″pseudo-transcript″,the conjunction reads and noise reads gain proper treatments.LDASeqⅡis validated using two real datasets on gene and transcript expression calculation and compared with latent dirichlet allocation for sequencing data（LDASeq）and other two popular methods Cufflinks and RNA-Seq by expectation maximization（RSEM）.The results show that LDASeqⅡobtains more accurate transcript and gene expression measurements than other approaches.

作者石新新刘学军张礼

机构地区南京航空航天大学计算机科学与技术学院

出处《数据采集与处理》 CSCD 北大核心 2015年第5期1028-1035,共8页 Journal of Data Acquisition and Processing

基金国家自然科学基金(61170152)资助项目中央高校基本科研业务费专项(CXZZ11_0217)资助项目

关键词基因表达 RNA-SEQ 转录组表达多源映射非均匀性 gene expression RNA-Seq transcript expression multi-mapping non-uniformity

分类号 TP391.9 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献21

1Wang Z, Gerstein M, Snyder M. RNA-Seq: A revolutionary tool for transcriptomics [J].Nature Reviews Genetics, 2009, 10 (1) : 57-63.
2Denoeud F, Aury J M, Da Silva C, et al. Annotating genomes with massive scale RNA sequencing[J]. Genome Biol, 2008, 9 (12):R175.
3Garber M, Grabherr M G, Guttman M, et al. Computational methods for transcriptome annotation and quantification using RNA-Seq[J]. Nature Methods, 2011, 8(6): 469-477.
4Marguerat S, Bahler J. RNA-seq: From technology to biology[J].Cell Mol Life Sci, 2010, 67: 569-579.
5Mortazavi A, Williams B A, Mccue K, et al. Mapping and quantifying mammalian transcriptomes by RNA-Seq[J]. Nature Methods, 2008, 5(7): 621-628.
6Pan Q, Shai O, Lee L J, et al. Deep surveying of alternative splicing complexity in the human transcriptome by high- throughput sequencing [J]. Nature Genetices, 2008, 40(12) : 1413-1415.
7Turro E, Su S Y, Goncalves fit, et al. Haplotype and isoform specific expression estimation using multi-mapping RNA-Seq reads [J]. Genome Biology, 2011, 12: R13.
8Jiang Hui, Wong Winghung. Statistical inferences for isoform expression in RNA-Seq [J].Bioinformatics, 2009, 25 (8): 1026-1032.
9Trapnell C, Williams B A, Pertea G, Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation[J]. Nat Biotechnol, 2010(5): 511-515.
10Li B, Ruotti V, Stewart R M, et al. RNA-Seq gene expression estimation with read mapping uncertainty [J]. Bioinformatics, 2010, 26(4): 493-500.

二级参考文献22

1Pan Qun, Shai Ofer, Lee W, et al. Blencowe BJ. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing [J]. Nature Genetices, 2008, 40( 12) : 1413 -1415.
2Skotheim RI, N ees M. Alternative splicing in cancer: noise, functional, or systematic? [J]. The International Journal of Biochemistry and Cell Biology, 2007, 39: 1432 - 1449.
3Wang Zhong, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics [J]. Nature Reviews Genetics, 2009, 10 (I) : 57 - 63.
4Turro E, Su Shu-Yi, Goncalves A, et al. Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads [J]. Genome biology, 2011, 12: R13.
5Mortazavi A, Williams BA, McCue K, et al. Mapping and quantifying mammalian transcriptomes by RNA-Seq [J]. Nature Methods, 2008, 5 (7) : 621 - 628.
6Jiang Hui, Wong Wing Hung. Statistical inferences for isoform expression in RNA-Seq [J]. Biolnformatics, 2009, 25 ( 8 ) : 1026 - 1032.
7Kim H, Bi Yingtao, Pal S, et al. IsoformEx: isoform level gene expression estimation using weighted non-negative least squares from mRNA-Seq data [J]. BMC Biolnformatics, 2011, 12: 305.
8Li Bo, Ruotti V, Stewart R. M, et al. RNA-Seq gene expression estimation with read mapping uncertainty [J]. Biolnformatics, 2010,26(4): 493 -500.
9Li Bo, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome [J]. BMC Biolnformatics, 2011, 12: 323.
10Katz Y, Wang Eric T, Airoldi EM, et al. Analysis and design of RN A sequencing experiments for identifying isoform regulation [J]. Nature Methods, 2010, 7: 1009 -1015.

共引文献2

1欧书华,刘学军,张礼.基于平滑LDA的RNA-Seq数据表达分析研究[J].计算机科学与探索,2016,10(3):381-388. 被引量：1
2王黎黎,刘学军,张礼.基于模型选择的差异基因和异构体检测[J].数据采集与处理,2016,31(5):965-973. 被引量：2

同被引文献39

1Wang Z,Gerstein M,Snyder A M.RNA-Seq:A revolutionary tool for transcriptomics[J].Nature Reviews Genetics,2008,10(1):57-63.
2Richard H,Schulz M H,Sultan M,et al.Prediction of alternative isoforms from exon expression levels in RNA-Seq experi-ments[J].Nucleic Acids Res,2010,38(10):e112.
3Wang L G,Xi Y X,Yu J,et al.A statistical method for the detection of alternative splicing using RNA-Seq[J].PLoS one,2010,5-(1):e8529.
4Anders S,Huber W.Differential expression analysis for sequence count data[J].Genome Biology,2010,11(10):R106.
5Hardcastle T J,Kelly K A.Bay-Seq:Empirical Bayesian methods for identifying differential expression in sequence count da-ta[J].BMC Bioinformatics,2010,11:422-439.
6Turro E,Su S Y,Gonalves,et al.Haplotype and isoform specific expression estimation using multi-mapping RNA-Seqreads[J].Genome Biol,2011,12(2):R13.
7Glaus P,Honkela A,Rattray M.Identifying differentially expressed transcripts from RNA-Seq data with biological variation[J].Bioinformatics,2012,28(13):1721-1728.
8Trapnell C,Roberts A,Goff L,et al.Differential gene and transcript expression analysis of RNA-Seq experiments with To-pHat and Cufflinks[J].Nature Protocols,2012,7(3):562-578.
9Jiang H,Wong W H.Statistical inferences for isoform expression in RNA-Seq[J].Bioinformatics,2009,25(8):1026-1032.
10Liu X,Zhang L,Chen S.Modeling exon-specific bias distribution improves the analysis of RNA-Seq data[J].Plos One,2015,10(10):e0140032.

引证文献3

1王黎黎,刘学军,张礼.基于模型选择的差异基因和异构体检测[J].数据采集与处理,2016,31(5):965-973. 被引量：2
2何升华,赖居易,孙志涛,王业广,王建,冯华龙,黄飞强.基于RNA-seq技术分析腰突颗粒防治腰椎间盘退变的转录组学特征[J].中国组织工程研究,2017,21(24):3778-3783. 被引量：4
3甘麦邻,沈林園,杨东丽,王定国,张顺华,朱砺.转录组学技术及其在猪肉质遗传研究中的应用[J].猪业科学,2020,37(2):94-98. 被引量：3

二级引证文献9

1史有阳,陈力新,秦悦农,韩向晖,孙霃平,刘胜.木犀草素对人AC16心肌细胞损伤治疗作用的转录组学研究[J].世界科学技术-中医药现代化,2020,22(5):1397-1404. 被引量：3
2王凯莉,张礼,刘学军.多实验平台下基因及异构体表达分析综述[J].中国生物医学工程学报,2017,36(2):211-218. 被引量：1
3王凯莉,张礼,刘学军.融合多平台表达数据的转录组差异表达分析[J].计算机学报,2018,41(6):1415-1430. 被引量：3
4何川,李孝林,刘燕,庞启雄,陈洪卫,张朝驹.人退变腰椎间盘组织中p53与IL-1β、IL-6表达[J].中国继续医学教育,2020,12(11):100-102. 被引量：4
5何升华,付远飞,蓝志明,孙志涛,赖居易,冯华龙,郭子宾,李盖.腰突颗粒通过调控miR-221表达干预髓核细胞的增殖与凋亡[J].中国组织工程研究,2021,25(14):2177-2182. 被引量：4
6李艳荣,李瑞雪,樊慧杰,孙芮芮,周文静,马艳苗,李艳彦,张波,周然,马存根,柴智.转录组学技术在中医药领域的应用研究进展[J].时珍国医国药,2022,33(4):943-947. 被引量：11
7李龙娇.转录组测序技术在猪肉质性状研究中的进展[J].今日畜牧兽医,2023,39(10):71-73.
8朱淑斌,徐盼,周春宝,许琴瑟,吴嘉韵.基于RNA-Seq技术筛选姜曲海猪子宫和卵巢发育相关基因[J].江苏农业学报,2024,40(1):130-140.
9石淳元,赵彦玲.基于转录组测序的藏猪睾丸组织性成熟相关基因鉴定[J].南方农业学报,2024,55(3):680-688.

1欧书华,刘学军,张礼.基于平滑LDA的RNA-Seq数据表达分析研究[J].计算机科学与探索,2016,10(3):381-388. 被引量：1
2白杨,王亚东.基于RNA-Seq数据识别外显子跳跃事件的方法研究综述[J].智能计算机与应用,2016,6(2):1-4.
3张礼,刘学军.一种基于Gamma模型的RNA-seq数据分析方法[J].南京大学学报（自然科学版）,2013,49(4):465-474. 被引量：2
4Wang Ruchuan Department of Computer Science and Technology Nanjing Institute of Posts and Telecommunications, Nanjing 210003 PRC.Study on Robot Sequencing Model in Flexible Manufacturing System Using Timed Petri Net[J].Computer Aided Drafting,Design and Manufacturing,1998,8(1):64-70.
5张礼,刘学军,陈松灿.基于多样本RNA-Seq数据的表达水平估计方法[J].计算机科学与探索,2016,10(2):210-219. 被引量：1
6李雨童,姚登举,李哲,侯金利.基于R的医学大数据挖掘系统研究[J].哈尔滨理工大学学报,2016,21(2):38-43. 被引量：7
7涂金金,杨明,郭丽娜.基于MapReduce的基因读段定位算法[J].模式识别与人工智能,2014,27(3):206-212. 被引量：2
8王黎黎,刘学军,张礼.基于RNA-seq数据的差异基因和异构体检测[J].南京大学学报（自然科学版）,2016,52(2):253-260. 被引量：2
9吴一雷,闫鹏程,刘充,陈禹保,赵文明.基于高通量RNA测序数据分析的弹性云平台[J].生物技术进展,2012,2(1):52-56. 被引量：1
10CHEN Geng,WANG Charles,SHI TieLiu.Overview of available methods for diverse RNA-Seq data analyses[J].Science China(Life Sciences),2011,54(12):1121-1128. 被引量：16

数据采集与处理

2015年第5期

浏览历史

内容加载中请稍等...

改进的RNA-Seq数据转录组表达分析研究被引量：3

参考文献21

二级参考文献22

共引文献2

同被引文献39

引证文献3

二级引证文献9

相关作者

相关机构

相关主题

浏览历史

改进的RNA-Seq数据转录组表达分析研究 被引量：3

参考文献21

二级参考文献22

共引文献2

同被引文献39

引证文献3

二级引证文献9

相关作者

相关机构

相关主题

浏览历史

改进的RNA-Seq数据转录组表达分析研究被引量：3