Identification of the splice sites is a critical and tough issue in eukaryotic genome annotation. Here, a statistical study is introduced for detecting the splicing signals in the human hemoglobin (Hb) pre-mRNAs by ...Identification of the splice sites is a critical and tough issue in eukaryotic genome annotation. Here, a statistical study is introduced for detecting the splicing signals in the human hemoglobin (Hb) pre-mRNAs by using the approaches of regional pairwise alignment, splicing weight matrix scoring, and dynamic extended folding. First, the regional pairwise alignment results show that the coding regions of the human Hb genes are at a high level for both conservation and fluctuation. Second, the weighted matrix scoring results indicate that, although the authentic splicing motifs are always scored the highest in a sequence, the sequence motif alone is inadequate to precisely define the splice sites. Finally, we deduce the RNA frame structures by applying an extended folding approach to analyze the stable folding elements. We find out that the splice sequences tend to take stretching and partially paired conformations, which benefit recognition and competitive binding of the splicing factors. These results indicate that precise splicing is an integrated effect of multiple mechanisms of signal recognition at the level of sequence and structure.展开更多
基金Supported by the National Natural Science Foundation of China (30971454, 9030318, and 90208018)
文摘Identification of the splice sites is a critical and tough issue in eukaryotic genome annotation. Here, a statistical study is introduced for detecting the splicing signals in the human hemoglobin (Hb) pre-mRNAs by using the approaches of regional pairwise alignment, splicing weight matrix scoring, and dynamic extended folding. First, the regional pairwise alignment results show that the coding regions of the human Hb genes are at a high level for both conservation and fluctuation. Second, the weighted matrix scoring results indicate that, although the authentic splicing motifs are always scored the highest in a sequence, the sequence motif alone is inadequate to precisely define the splice sites. Finally, we deduce the RNA frame structures by applying an extended folding approach to analyze the stable folding elements. We find out that the splice sequences tend to take stretching and partially paired conformations, which benefit recognition and competitive binding of the splicing factors. These results indicate that precise splicing is an integrated effect of multiple mechanisms of signal recognition at the level of sequence and structure.