摘要
RNA-Seq是目前转录组研究的一种重要技术,针对RNA-Seq数据分析中读段的多源映射,参考序列分布的不均匀性,一些转录本中外显子分布稀疏以及跨结合区读段处理问题,提出了一个新的转录组表达研究模型sLDASeqQ该模型根据基因中转录本注释信息对模型参数进行约束,对跨结合区的读段按长度分配处理,解决了读段非均匀分布和跨结合区问题;在模型中增加一个超参数,从而解决了外显子的稀疏问题。将该模型应用到3个真实的数据集上,并与其他主流方法进行比较,结果表明该模型获得了较为准确的基因以及转录本表达水平计算结果。
RNA-Seq is an important technique for transcriptome research.Considering the multi-mappings between reads and isoforms,non-uniform distribution of reads along the reference sequence,conjunction reads and the sparsity caused by the large exon size,this paper proposes a new method,sLDASeq,to calculate the gene and transcript expression.To solve the problems of multi-mappings,non-uniform distribution of reads and conjunction reads,the model utilizes the known gene-isoform annotation to constrain the hyper-parameters and allocate the read counts according to exon length.By adding a hyper-parameter,the model solves the problem of sparsity in the exons.sLDASeq is validated by using three real datasets on the gene and transcript expression calculation and compared with LDASeq and other popular methods.Results show that sLDASeq obtains more accurate transcript and gene expression measurements than other methods.
出处
《计算机科学与探索》
CSCD
北大核心
2016年第3期381-388,共8页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金No.61170152
江苏省青蓝工程
中央高校基本科研业务费专项资金No.CXZZ11_0217~~