系统性红斑狼疮(SLE)是一种自身免疫性疾病,表观遗传变异在SLE的发病机制中起重要作用。已有研究证明,异常DNA甲基化发生在SLE发展的各个过程中,调控相关基因表达水平。因此,寻找受影响的关键基因有助于SLE的诊断和治疗。首先,本文从Gen...系统性红斑狼疮(SLE)是一种自身免疫性疾病,表观遗传变异在SLE的发病机制中起重要作用。已有研究证明,异常DNA甲基化发生在SLE发展的各个过程中,调控相关基因表达水平。因此,寻找受影响的关键基因有助于SLE的诊断和治疗。首先,本文从Gene Expression Omnibus (GEO)数据库中下载了基因表达数据和DNA甲基化数据,利用生物信息学的方法,对在外周血单核细胞(PBMC)的基因表达和DNA甲基化数据进行差异分析,甲基化差异表达的基因被记录为差异甲基化基因(DMG)与差异表达基因(DEG)之间的重叠基因。使用DAVID数据库对受甲基化影响基因(mDEG)的功能富集分析。然后使用STRING数据库构建蛋白质–蛋白质相互作用(PPI)网络以获得参与SLE的关键基因。之后,本研究利用受试者工作特征(ROC)曲线评估hub基因,以验证其区分SLE与健康对照组的能力。最后,我们构建了一个hub基因-miRNA网络,并对共享基因进行了功能富集。展开更多
变构是调节蛋白质功能的重要机制,对许多生物过程至关重要。变构调节剂比正构剂具有更高的特异性和更低的毒副作用,这使得变构药物设计比正构药物设计有更多的优势。变构位点的发现是变构药物设计的前提,目前实验上获得的变构位点多是...变构是调节蛋白质功能的重要机制,对许多生物过程至关重要。变构调节剂比正构剂具有更高的特异性和更低的毒副作用,这使得变构药物设计比正构药物设计有更多的优势。变构位点的发现是变构药物设计的前提,目前实验上获得的变构位点多是偶然所得,因此亟待发展有效的理论方法来预测蛋白质变构位点。本工作提出了一种集成的机器学习方法AllosEC用于预测蛋白质变构口袋,该方法除了考虑口袋的理化性质外,还加入了口袋的二级结构信息、深度指数(DPX)和突出指数(CX)特征。另外,为了克服正负样本极度不平衡的问题,本工作使用欠采样方法来平衡训练数据集。在独立测试集上,AllosEC在多个评价指标上优于现有的其他方法,SEN、SPE、PRE和MCC分别为0.708、0.915、0.405和0.486。这样,本工作提供了性能良好的蛋白质变构位点预测方法AllosEC。Allostery is an important mechanism for regulating protein functions, which is essential for many biological processes. Compared with orthosteric regulators, allosteric regulators have higher specificity and lower toxicities, which makes allosteric drug design have more advantages than orthosteric drug design. The discovery of allosteric sites is a prerequisite for allosteric drug design. Currently, experimentally obtained allosteric sites are mostly obtained by chance, and therefore there is an urgent need to develop effective theoretical methods to predict protein allosteric sites. Here, we present an ensemble machine learning method AllosEC for protein allosteric pocket prediction, where besides the pockets’ physicochemical properties, their secondary structure information, depth indexes (DPXes) and protrusion indexes (CXes) are considered. In order to overcome the problem of extreme imbalance between positive and negative samples, this work uses an under sampling method to balance the training dataset. AllosEC outperforms other existing methods in multiple evaluation metrics on the independent test set, with SEN, SPE, PRE and MCC of 0.708, 0.915, 0.405 and 0.486, respectively. Thus, this work provides a good method AllosEC for protein allosteric site prediction.展开更多
RNA分子的动力学与其功能密切相关。RNA分子的柔性,作为其动力学最基本的特性之一,已被广泛用于研究其折叠性质、结构稳定性和配体结合能力等诸多方面。实验测定RNA柔性的方法往往比较耗时费力,因此急需发展一种快速、准确的理论方法来...RNA分子的动力学与其功能密切相关。RNA分子的柔性,作为其动力学最基本的特性之一,已被广泛用于研究其折叠性质、结构稳定性和配体结合能力等诸多方面。实验测定RNA柔性的方法往往比较耗时费力,因此急需发展一种快速、准确的理论方法来预测RNA的柔性。为此,本文提出了一种机器学习方法RNAfwe来预测RNA柔性,该方法采用词嵌入技术提取RNA序列特征。RNAfwe与同类基于序列的RNAflex方法比较,结果显示:相比于使用独热编码的RNAflex (One-Hot),RNAfwe在训练和测试集上都获得了更高的皮尔逊相关系数(PCC) 0.5017和0.4704,这表明词嵌入相较于独热编码可从RNA序列中提取与柔性更相关的特征;相比于利用进化信息的RNAflex (PSSM),尽管RNAfwe的性能稍差,但前者需要知道足够的同源序列。这项工作有助于RNA动力学性质的研究,另外为词嵌入技术广泛用于生物信息学研究提供了支持。RNA molecular dynamics is closely related to their functions. The flexibility of RNA molecules, as one of the most fundamental characteristics of their dynamics, has been widely used to study their folding properties, structural stability, ligand binding ability and so on. Experimental methods for measuring RNA flexibility are often time-consuming and labor intensive, so there is an urgent need to develop a fast and accurate theoretical method to predict RNA flexibility. To this end, we propose a machine learning method, RNAfwe, to predict RNA flexibility, which uses the word embedding technique to extract RNA sequence features. The comparison of RNAfwe with the similar sequence-based RNAflex method shows that compared with RNAflex (One-Hot), RNAfwe obtains higher Pearson correlation coefficients (PCC) of 0.5017 and 0.4704 on both training and test sets, indicating that the word embedding could extract the more related features to flexibility from RNA sequences than the one-hot encoding. Compared with RNAflex (PSSM) which uses evolutionary information, although RNAfwe has a slightly inferior performance, the former requires the knowledge of sufficient homologous sequences. This work contributes to the study of RNA dynamic properties, and provides the support for word embedding technique to be widely used in bioinformatics research.展开更多
文摘系统性红斑狼疮(SLE)是一种自身免疫性疾病,表观遗传变异在SLE的发病机制中起重要作用。已有研究证明,异常DNA甲基化发生在SLE发展的各个过程中,调控相关基因表达水平。因此,寻找受影响的关键基因有助于SLE的诊断和治疗。首先,本文从Gene Expression Omnibus (GEO)数据库中下载了基因表达数据和DNA甲基化数据,利用生物信息学的方法,对在外周血单核细胞(PBMC)的基因表达和DNA甲基化数据进行差异分析,甲基化差异表达的基因被记录为差异甲基化基因(DMG)与差异表达基因(DEG)之间的重叠基因。使用DAVID数据库对受甲基化影响基因(mDEG)的功能富集分析。然后使用STRING数据库构建蛋白质–蛋白质相互作用(PPI)网络以获得参与SLE的关键基因。之后,本研究利用受试者工作特征(ROC)曲线评估hub基因,以验证其区分SLE与健康对照组的能力。最后,我们构建了一个hub基因-miRNA网络,并对共享基因进行了功能富集。
文摘变构是调节蛋白质功能的重要机制,对许多生物过程至关重要。变构调节剂比正构剂具有更高的特异性和更低的毒副作用,这使得变构药物设计比正构药物设计有更多的优势。变构位点的发现是变构药物设计的前提,目前实验上获得的变构位点多是偶然所得,因此亟待发展有效的理论方法来预测蛋白质变构位点。本工作提出了一种集成的机器学习方法AllosEC用于预测蛋白质变构口袋,该方法除了考虑口袋的理化性质外,还加入了口袋的二级结构信息、深度指数(DPX)和突出指数(CX)特征。另外,为了克服正负样本极度不平衡的问题,本工作使用欠采样方法来平衡训练数据集。在独立测试集上,AllosEC在多个评价指标上优于现有的其他方法,SEN、SPE、PRE和MCC分别为0.708、0.915、0.405和0.486。这样,本工作提供了性能良好的蛋白质变构位点预测方法AllosEC。Allostery is an important mechanism for regulating protein functions, which is essential for many biological processes. Compared with orthosteric regulators, allosteric regulators have higher specificity and lower toxicities, which makes allosteric drug design have more advantages than orthosteric drug design. The discovery of allosteric sites is a prerequisite for allosteric drug design. Currently, experimentally obtained allosteric sites are mostly obtained by chance, and therefore there is an urgent need to develop effective theoretical methods to predict protein allosteric sites. Here, we present an ensemble machine learning method AllosEC for protein allosteric pocket prediction, where besides the pockets’ physicochemical properties, their secondary structure information, depth indexes (DPXes) and protrusion indexes (CXes) are considered. In order to overcome the problem of extreme imbalance between positive and negative samples, this work uses an under sampling method to balance the training dataset. AllosEC outperforms other existing methods in multiple evaluation metrics on the independent test set, with SEN, SPE, PRE and MCC of 0.708, 0.915, 0.405 and 0.486, respectively. Thus, this work provides a good method AllosEC for protein allosteric site prediction.
文摘RNA分子的动力学与其功能密切相关。RNA分子的柔性,作为其动力学最基本的特性之一,已被广泛用于研究其折叠性质、结构稳定性和配体结合能力等诸多方面。实验测定RNA柔性的方法往往比较耗时费力,因此急需发展一种快速、准确的理论方法来预测RNA的柔性。为此,本文提出了一种机器学习方法RNAfwe来预测RNA柔性,该方法采用词嵌入技术提取RNA序列特征。RNAfwe与同类基于序列的RNAflex方法比较,结果显示:相比于使用独热编码的RNAflex (One-Hot),RNAfwe在训练和测试集上都获得了更高的皮尔逊相关系数(PCC) 0.5017和0.4704,这表明词嵌入相较于独热编码可从RNA序列中提取与柔性更相关的特征;相比于利用进化信息的RNAflex (PSSM),尽管RNAfwe的性能稍差,但前者需要知道足够的同源序列。这项工作有助于RNA动力学性质的研究,另外为词嵌入技术广泛用于生物信息学研究提供了支持。RNA molecular dynamics is closely related to their functions. The flexibility of RNA molecules, as one of the most fundamental characteristics of their dynamics, has been widely used to study their folding properties, structural stability, ligand binding ability and so on. Experimental methods for measuring RNA flexibility are often time-consuming and labor intensive, so there is an urgent need to develop a fast and accurate theoretical method to predict RNA flexibility. To this end, we propose a machine learning method, RNAfwe, to predict RNA flexibility, which uses the word embedding technique to extract RNA sequence features. The comparison of RNAfwe with the similar sequence-based RNAflex method shows that compared with RNAflex (One-Hot), RNAfwe obtains higher Pearson correlation coefficients (PCC) of 0.5017 and 0.4704 on both training and test sets, indicating that the word embedding could extract the more related features to flexibility from RNA sequences than the one-hot encoding. Compared with RNAflex (PSSM) which uses evolutionary information, although RNAfwe has a slightly inferior performance, the former requires the knowledge of sufficient homologous sequences. This work contributes to the study of RNA dynamic properties, and provides the support for word embedding technique to be widely used in bioinformatics research.