摘要
准确识别增强子-启动子相互作用(EPIs)对疾病来源追踪和发展基因疗法有重要意义。现有预测方法缺乏对序列不同粒度信息的关注,提取增强子、启动子序列包含的不同粒度特征有助于从多层级分析EPIs。因此,提出EPIs预测模型EPI-PBGA(Parallel BiGRU Attention Network),分别通过卷积子网络和双层双向门循环单元(BiGRU)注意子网络提取序列的细粒度、粗糙粒度特征。基于EPIs普遍存在的细胞特异性,在不同细胞系进行粒度选择,选定最优粗糙粒度,同时通过双层BiGRU注意网络提取元件子序列中存在的多种关联特征。实验结果表明,EPI-PBGA在6个基准数据集表现出较好性能,有效预测EPIs。
Accurate identification of enhancer-promoter interactions(EPIs)has important significance for tracking disease source and developing gene therapy.Some existing EPIs prediction methods mainly focus on the extraction of specific level sequence features,and lack attention to multi-level feature fusion in enhancer and promoter sequences.By introducing fine-grained and coarse-grained,a parallel bidirectional gating unit attention network-based EPIs prediction model,EPI-PBGA,is proposed to extract different levels of features and explore the complementarity between different levels.Through two sub-networks,the hierarchical bidirectional gating unit attention(TBGA)sub-network and convolutional neural network(CNN)sub-network,EPI-PBGA can learn the multi-granularity features of sequences separately.Due to the ubiquitous cell-specificity of EPIs,the optimal coarse grain size is determined individually in different cell lines by using sequence segmentation strategy.TBGA processes component subsequences through a component-global progressive strategy at the coarse granularity and obtains multiple component-level feature vectors such that this model can capture potential association information between component-level vectors,including promoter-promoter association information,enhancer-enhancer association information,and enhancer-promoter association information that is often ignored.Moreover,a CNN network only with fewer filters is still applied for fine-grained,because of the better extract performance of CNN in previous studies.Multi-granularity information is obtained by fusing high-dimensional features that are extracted via two sub-networks.CNN sub-network and TBGA sub-network enable this model not only to explore the complementarity between features of different grain,but also to solve the problem of feature loss in sequence segmentation.The experimental results show that EPI-PBGA can effectively combine different granularity information.By comparison with previous EPIs prediction methods,PBGA performs better on six-cell datasets and can effectively predict EPIs.
作者
刘志豪
王会青
李浩琳
韩家乐
LIU Zhihao;WANG Huiqing;LI Haolin;HAN Jiale(School of Information and Computer,Taiyuan University of Technology,Taiyuan 030600,China)
出处
《华东理工大学学报(自然科学版)》
CAS
CSCD
北大核心
2024年第1期106-113,共8页
Journal of East China University of Science and Technology
基金
山西省自然科学基金基金(202203021211121)。