摘要
DNA N4-胞嘧啶甲基化(N4-methylcytosine,4mC)是一种重要的表观遗传修饰,能在基因表达、细胞修复、DNA复制及保护等方面发挥作用.机器学习算法在预测4mC位点时,一个重要的环节是特征提取,为更充分地提取数据特征,进一步提高4mC位点的预测准确率,提出了一种基于双层卷积神经网络的4mC位点预测模型.首先,将序列数据进行特征编码,搭建具有双卷积层和双池化层的卷积神经网络模型,采用L2范式正则化避免模型过拟合,并采用10折交叉验证保证模型预测的稳定性;其次,对模型参数进行调试,选取预测能力较高的参数组合进行模型训练;最后,将模型的4mC位点预测能力与几种已有算法进行比较.结果表明,双层卷积神经网络模型具有较好的预测性能和鲁棒性,优于基于一般机器学习和单层卷积神经网络的4mC位点预测算法,有效提高了4mC位点的预测能力.
DNA N4-methylcytosine(4mC)is an important epigenetic modification that plays a role in gene expression,cell repair,DNA replication and protection.Feature extraction is an important step in the prediction of 4mC sites via machine learning algorithm.In order to fully extract data features and further improve the prediction accuracy of 4mCsites,a prediction model of 4mC sites based on double-layer convolutional neural network was proposed.Firstly,a convolutional neural network model with double convolutional layer and double pooled layer was built after feature coding of the sequence data.A L2 normal form regularization was used to avoid overfitting of the model,and a 10-fold cross validation was used to ensure the prediction stability of the model.Secondly,we debugged the parameters of the model and selected the combination of parameters with higher prediction ability to train the model.Finally,the 4mC sites prediction ability of the model was compared with several existing algorithms.The results show that the two-layer convolutional neural network model has better prediction performance and robustness than the 4mC sites prediction algorithm based on general machine learning and single-layer convolutional neural network,and effectively improves the prediction ability of 4mC sites.
作者
陈鹏辉
徐权峰
李荣庭
王煜
胡梦
喻文霞
李慧敏
唐轶
CHEN Peng-hui;XU Quan-feng;LI Rong-ting;WANG Yu;HU Meng;YU Wen-xia;LI Hui-min;TANG Yi(School of Mathematics and Computer Science,Yunnan Minzu University,Kunming 650500,China)
出处
《云南民族大学学报(自然科学版)》
CAS
2022年第4期450-457,472,共9页
Journal of Yunnan Minzu University:Natural Sciences Edition
基金
国家自然科学基金(61866040)
云南民族大学研究生科研项目(SJXY2020-108).