基于平滑LDA的RNA-Seq数据表达分析研究被引量：1

RNA-Seq Data Expression Analysis Based on Smoothed LDA

下载PDF

导出

摘要 RNA-Seq是目前转录组研究的一种重要技术,针对RNA-Seq数据分析中读段的多源映射,参考序列分布的不均匀性,一些转录本中外显子分布稀疏以及跨结合区读段处理问题,提出了一个新的转录组表达研究模型sLDASeqQ该模型根据基因中转录本注释信息对模型参数进行约束,对跨结合区的读段按长度分配处理,解决了读段非均匀分布和跨结合区问题;在模型中增加一个超参数,从而解决了外显子的稀疏问题。将该模型应用到3个真实的数据集上,并与其他主流方法进行比较,结果表明该模型获得了较为准确的基因以及转录本表达水平计算结果。 RNA-Seq is an important technique for transcriptome research.Considering the multi-mappings between reads and isoforms,non-uniform distribution of reads along the reference sequence,conjunction reads and the sparsity caused by the large exon size,this paper proposes a new method,sLDASeq,to calculate the gene and transcript expression.To solve the problems of multi-mappings,non-uniform distribution of reads and conjunction reads,the model utilizes the known gene-isoform annotation to constrain the hyper-parameters and allocate the read counts according to exon length.By adding a hyper-parameter,the model solves the problem of sparsity in the exons.sLDASeq is validated by using three real datasets on the gene and transcript expression calculation and compared with LDASeq and other popular methods.Results show that sLDASeq obtains more accurate transcript and gene expression measurements than other methods.

作者欧书华刘学军张礼

机构地区南京航空航天大学计算机科学与技术学院

出处《计算机科学与探索》 CSCD 北大核心 2016年第3期381-388,共8页 Journal of Frontiers of Computer Science and Technology

基金国家自然科学基金No.61170152 江苏省青蓝工程中央高校基本科研业务费专项资金No.CXZZ11_0217~~

关键词 RNA-SEQ 基因转录本表达水平平滑LDA 结合区多源映射非均匀性 RNA-Seq gene and transcript expression smoothed LDA exon-junction multi-mapping non-uniformity

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献24

1Wang Zhong, Gerstein M, Snyder M. RNA-Seq: a revolu- tionary tool for transcriptomics[J]. Nature Reviews Genet- ics, 2009, 10(1): 57-63.
2Sultan M, Amstislavskiy V, Risch T. Influence of RNA ex- traction methods and library selection schemes on RNA- seq data[J]. BMC Genomics, 2014, 15: 675-688.
3Robert A W, Philippa A S, Catherine M M. RNA Seq analy- sis of the Eimeria tenella gametocyte transcriptome reveals clues about the molecular basis for sexual reproduction and oocyst biogenesis[J]. BMC Genomics, 2015, 16: 94-114.
4王曦,汪小我,王立坤,冯智星,张学工.新一代高通量RNA测序数据的处理与分析[J].生物化学与生物物理进展,2010,37(8):834-846. 被引量：64
5Xiao Shengiian, Zhang Chi, Zou Quan, et al. TiSGeD: a data- base for tissue-specific genes[J]. Bioinformatics, 2010, 26 (9): 1273-1275.
6Pan Jianbo, Hu Shichang, Shi Dan, et al. PaGenBase: a pat- tern gene database for the global and dynamic understanding ofgene function[J]. PLoS ONE, 2013, 8(12): e80747.
7Pan Jianbo, Hu Shichang, Wang Hao, et al. PaGeFinder: quantitative identification of spatiotemporal pattern genes[J]. Bioinformatics, 2012, 28(11): 1544-1545.
8Mortazavi A, Williams B A, McCue K, et al. Mapping and quantifying mammalian tmnscriptomes by RNA-seq[J]. Nature Methods, 2008, 5(7): 621-628.
9Bullard J H, Purdom E, Hansen K D, et al. Evaluation of statistical methods for normalization and differential expres- sion in mRNA-Seq experiments[J]. BMC Bioinformatics, 2010, 11: 94-107.
10Trapnell C, Williams B A, Pertea G. Transcript assembly and quantification by RNA-Seq reveals unannotated tran- scripts and isoform switching during cell differentiation[J]. Nature Biotechnology, 2011, 28(5): 511-515.

二级参考文献110

1Marioni J C, Mason C E, Mane S M, et al. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res, 2008, 18(9): 1509-1517.
2Mortazavi A, Williams B A, McCue K, et al. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods, 2008, 5(7): 621-628.
3Nagalakshmi U, Wang Z, Waem K, et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science, 2008, 320(5881): 1344-1349.
4Sultan M, Schulz M H, Richard H, et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science, 2008, 321(5891): 956-960.
5Wang E T, Sandberg R, Luo S, etal. Alternative isoform regulation in human tissue transcriptomes. Nature, 2008, 456(7221): 470-476.
6Birzele F, Schaub J, Rust W, et al. Into the unknown: expression profiling without genome sequence information in CHO by next generation sequencing. Nucleic Acids Res, 2010, doi: 10.1093/nar/ gkq 116.
7Sanger F, Nicklen S, Coulson A R. DNA sequencing with chain- terminating inhibitors. Proc Natl Acad Sci USA, 1977, 74 (12): 5463 -5467.
8Margulies M, Egholm M, Altman W E, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature, 2005, 437(7057): 376-380.
9Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol, 2008, 26(10): 1135 1145.
10Ruparel H, Bi L, Li Z, et al. Design and synthesis of a 3'-O-allyl photocleavable fluorescent nucleotide as a reversible terminator for DNA sequencing by synthesis. Proe Natl Acad Sci USA, 2005, 102(17): 5932-5937.

共引文献65

1刘戈辉,韩泽刚,孙士超,张薇.转GhB301基因棉花响应枯萎病菌侵染的转录组分析[J].核农学报,2021,35(12):2733-2745. 被引量：3
2周扬,屈武斌,卢一鸣,杨毅,张成岗.TXT2DNA:基于DNA序列的文本编、解码及比对软件系统[J].军事医学,2011,35(4):315-317.
3ZENG ZhaoYang,HUANG HongBin,ZHANG WenLing,XIANC Bo,ZHOU Ming,ZHOU YanHong,MA Jian,YI Mei,LI XiaYu,LI XiaoLing,XIONG Wei,LI GuiYuan.Nasopharyngeal carcinoma:Advances in genomics and molecular genetics[J].Science China(Life Sciences),2011,54(10):966-975. 被引量：13
4黄宏斌,梁芳,熊炜,李小玲,曾朝阳,李桂源.生物信息技术加速开发旧药新用途[J].生物化学与生物物理进展,2012,39(1):35-44. 被引量：6
5吴一雷,闫鹏程,刘充,陈禹保,赵文明.基于高通量RNA测序数据分析的弹性云平台[J].生物技术进展,2012,2(1):52-56. 被引量：1
6邹权,李旭斌,林子雨,江弋,林琛.下一代测序技术数据中的选择性剪切计算识别方法研究[J].电子学报,2012,40(2):350-357.
7申欣,田美,朱长保,刘会莲,赵方庆.应用新一代测序技术测定大室别藻苔虫线粒体基因组全序列[J].海洋学报,2012,34(2):136-142.
8刘朋虎,邓优锦,江玉姬,谢宝贵.草菇PGAM基因克隆、结构及其在同核、异核菌株中的表达量分析[J].福建农业学报,2012,27(3):252-256. 被引量：8
9高山,张宁,李勃,徐硕,叶彦波,阮吉寿.下一代测序中ChIP-seq数据的处理与分析[J].遗传,2012,34(6):773-783. 被引量：5
10孙磊,张林,刘辉.基于RNA-Seq的长非编码RNA预测[J].生物化学与生物物理进展,2012,39(12):1156-1166. 被引量：5

同被引文献1

1王曦,汪小我,王立坤,冯智星,张学工.新一代高通量RNA测序数据的处理与分析[J].生物化学与生物物理进展,2010,37(8):834-846. 被引量：64

引证文献1

1欧书华,刘学军,张礼.基于KL散度的RNA-Seq数据差异异构体比例检测[J].计算机工程与科学,2017,39(1):158-164. 被引量：3

二级引证文献3

1毛靖.图书资源共享下用户浏览行为差异检测仿真[J].计算机仿真,2018,35(11):401-404. 被引量：1
2郭玉栋,左金平.大数据分析下DAO模式数据库间差异消除仿真[J].计算机仿真,2019,36(12):336-340. 被引量：3
3刘文斌,王兵,方刚,石晓龙,许鹏.基于中值的JS散度可变剪接差异分析研究[J].电子与信息学报,2020,42(6):1392-1400. 被引量：5

1石新新,刘学军,张礼.改进的RNA-Seq数据转录组表达分析研究[J].数据采集与处理,2015,30(5):1028-1035. 被引量：3
2张礼,刘学军.一种基于Gamma模型的RNA-seq数据分析方法[J].南京大学学报（自然科学版）,2013,49(4):465-474. 被引量：2
3白杨,王亚东.基于RNA-Seq数据识别外显子跳跃事件的方法研究综述[J].智能计算机与应用,2016,6(2):1-4.
4韩禕伟,王盈.电子政务云的应用需求及经济效益分析[J].科技信息,2012(16):105-105.
5石莹,何炎祥,刘茂福.一种基于交互式遗传算法的图像检索模型[J].计算机工程,2006,32(7):207-209. 被引量：5
6袁孝恩.刍议元计算机技术[J].电脑技术信息,1997(7):12-13.
7何新权,陆达.制作多媒体软件的两种方案[J].计算机世界月刊,1995(9):26-36. 被引量：2
8瞿雷达.网络公司数据库营销商函成功发布[J].中国邮政,2005(7):29-29.
9张礼,刘学军,陈松灿.基于多样本RNA-Seq数据的表达水平估计方法[J].计算机科学与探索,2016,10(2):210-219. 被引量：1
10涂金金,杨明,郭丽娜.基于MapReduce的基因读段定位算法[J].模式识别与人工智能,2014,27(3):206-212. 被引量：2

计算机科学与探索

2016年第3期

浏览历史

内容加载中请稍等...

基于平滑LDA的RNA-Seq数据表达分析研究被引量：1

参考文献24

二级参考文献110

共引文献65

同被引文献1

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于平滑LDA的RNA-Seq数据表达分析研究 被引量：1

参考文献24

二级参考文献110

共引文献65

同被引文献1

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于平滑LDA的RNA-Seq数据表达分析研究被引量：1