The drug development process takes a long time since it requires sorting through a large number of inactive compounds from a large collection of compounds chosen for study and choosing just the most pertinent compound...The drug development process takes a long time since it requires sorting through a large number of inactive compounds from a large collection of compounds chosen for study and choosing just the most pertinent compounds that can bind to a disease protein.The use of virtual screening in pharmaceutical research is growing in popularity.During the early phases of medication research and development,it is crucial.Chemical compound searches are nowmore narrowly targeted.Because the databases containmore andmore ligands,thismethod needs to be quick and exact.Neural network fingerprints were created more effectively than the well-known Extended Connectivity Fingerprint(ECFP).Only the largest sub-graph is taken into consideration to learn the representation,despite the fact that the conventional graph network generates a better-encoded fingerprint.When using the average or maximum pooling layer,it also contains unrelated data.This article suggested the Graph Convolutional Attention Network(GCAN),a graph neural network with an attention mechanism,to address these problems.Additionally,it makes the nodes or sub-graphs that are used to create the molecular fingerprint more significant.The generated fingerprint is used to classify drugs using ensemble learning.As base classifiers,ensemble stacking is applied to Support Vector Machines(SVM),Random Forest,Nave Bayes,Decision Trees,AdaBoost,and Gradient Boosting.When compared to existing models,the proposed GCAN fingerprint with an ensemble model achieves relatively high accuracy,sensitivity,specificity,and area under the curve.Additionally,it is revealed that our ensemble learning with generated molecular fingerprint yields 91%accuracy,outperforming earlier approaches.展开更多
Medulloblastoma is the most common malignant pediatric brain tumor. In mice, Ptcl haploinsufficiency and disruption of DNA repair (DNA ligase IV inactivation) or cell cycle regulation (Kipl, Ink4d, or Inkd.c inactivat...Medulloblastoma is the most common malignant pediatric brain tumor. In mice, Ptcl haploinsufficiency and disruption of DNA repair (DNA ligase IV inactivation) or cell cycle regulation (Kipl, Ink4d, or Inkd.c inactivation), in conjunction with p53 dysfunction, predispose to medulloblastoma. To identify genes important for this tumor, we evaluated gene expression profiles in medulloblastomas from these mice. Unexpectedly, medulloblastoma展开更多
The capture of trace amounts of non-methane hydrocarbons(NMHCs)from air due to the toxicity of volatile organic compounds is a significant challenge.A total of 31399 hydrophobic metal–organic frameworks(MOFs)were fir...The capture of trace amounts of non-methane hydrocarbons(NMHCs)from air due to the toxicity of volatile organic compounds is a significant challenge.A total of 31399 hydrophobic metal–organic frameworks(MOFs)were first screened from 137953 hypothetical MOFs using high-throughput computational screening(HTCS),and their performance indices(adsorption capacity and selectivity)for the adsorption of NMHCs(C_(3)–C_(6))were obtained by molecular simulations.The discovery of a“second peak”near twice the kinetic diameter of the corresponding NMHC provided more choices for excellent MOFs that adsorb NMHCs.Four machine learning(ML)classification and regression algorithms predicted the performance of MOFs,and the relative importance values of the six descriptors were determined.The combination of the Random Forests algorithm and Molecular ACCess Systems molecular fingerprint(MF)had an excellent predictive ability for MOFs.According to the performance,the fingerprint commonalities of the 100 top-performing MOFs were counted,and the excellent bits(EBs)that could promote the performance were defined.Finally,new substructures containing all of the EBs were designed for each NMHC to build a new MOF database.This work combined the HTCS,ML,and MF to provide a detailed insight into the design of efficient MOFs for adsorbing NMHCs.展开更多
Single nucletide polymorphism(SNP)is an important factor for the study of genetic variation in human families and animal and plant strains.Therefore,it is widely used in the study of population genetics and disease re...Single nucletide polymorphism(SNP)is an important factor for the study of genetic variation in human families and animal and plant strains.Therefore,it is widely used in the study of population genetics and disease related gene.In pharmacogenomics research,identifying the association between SNP site and drug is the key to clinical precision medication,therefore,a predictive model of SNP site and drug association based on denoising variational auto-encoder(DVAE-SVM)is proposed.Firstly,k-mer algorithm is used to construct the initial SNP site feature vector,meanwhile,MACCS molecular fingerprint is introduced to generate the feature vector of the drug module.Then,we use the DVAE to extract the effective features of the initial feature vector of the SNP site.Finally,the effective feature vector of the SNP site and the feature vector of the drug module are fused input to the support vector machines(SVM)to predict the relationship of SNP site and drug module.The results of five-fold cross-validation experiments indicate that the proposed algorithm performs better than random forest(RF)and logistic regression(LR)classification.Further experiments show that compared with the feature extraction algorithms of principal component analysis(PCA),denoising auto-encoder(DAE)and variational auto-encode(VAE),the proposed algorithm has better prediction results.展开更多
Carbonate radical is among the most important environmental relevant reactive species which govern the transformation and fate of pharmaceutical contaminants(PCs).However,reaction rate constants between carbonate radi...Carbonate radical is among the most important environmental relevant reactive species which govern the transformation and fate of pharmaceutical contaminants(PCs).However,reaction rate constants between carbonate radical and most of the PCs have not been experimentally determined,and quantitative structural-activity relationships(QSARs)have not been established for rate estimation.This study applied Max Min data processing method and used molecular fingerprints(MF)as the input of a deep neural network(DNN)to predict the rate constants between carbonate radical and organic compounds.MF parameters and the hyper-structure of the DNN were adjusted to yield satisfactory accuracy of rate prediction.The vector length of 512 bits with radius of 1 for MF and 5 hidden layers gave the best performance.The optimized MaxMin-MF-DNN model was compared with some of the most commonly used QSARs and machine learning methods,including random data splitting,molecular descriptors,supporting vector machine,decision tree,etc.Results showed that the MF-DNN model out-performed the other methods by more than 10%increase in prediction accuracy.Applying this MF-DNN model,we estimated reaction rates between carbonate radical and pharmaceuticals used in human medicine(1576)and veterinary practice(390).Among them,46 drugs were identified as fast-reacting compounds,suggesting the important relations of their environmental fate with carbonate radical.展开更多
Identifying the association between metabolites and diseases will help us understand the pathogenesis of diseases,which has great significance in diagnosing and treating diseases.However,traditional biometric methods ...Identifying the association between metabolites and diseases will help us understand the pathogenesis of diseases,which has great significance in diagnosing and treating diseases.However,traditional biometric methods are time consuming and expensive.Accordingly,we propose a new metabolite-disease association prediction algorithm based on DeepWalk and random forest(DWRF),which consists of the following key steps:First,the semantic similarity and information entropy similarity of diseases are integrated as the final disease similarity.Similarly,molecular fingerprint similarity and information entropy similarity of metabolites are integrated as the final metabolite similarity.Then,DeepWalk is used to extract metabolite features based on the network of metabolite-gene associations.Finally,a random forest algorithm is employed to infer metabolite-disease associations.The experimental results show that DWRF has good performances in terms of the area under the curve value,leave-one-out cross-validation,and five-fold cross-validation.Case studies also indicate that DWRF has a reliable performance in metabolite-disease association prediction.展开更多
Aim: The aim of the study is to test visible resonance Raman (VRR) spectroscopy for rapid skin cancer diagnosis,and evaluate its effectiveness as a new optical biopsy method to distinguish basal cell carcinoma (BCC) f...Aim: The aim of the study is to test visible resonance Raman (VRR) spectroscopy for rapid skin cancer diagnosis,and evaluate its effectiveness as a new optical biopsy method to distinguish basal cell carcinoma (BCC) from normal skin tissues.Methods: The VRR spectroscopic technique was undertaken using 532 nm excitation. Normal and BCC human skin tissue samples were measured in seconds. The molecular fingerprints of various native biomolecules as biomarkers were analyzed. A principal component analysis - support vector machine (PCA-SVM) statistical analysis method based on the molecular fingerprints was developed for differentiating BCC from normal skin tissues.Results: VRR provides a rapid method and enhanced Raman signals from biomolecules with resonant and nearresonant absorption bands as compared with using a near-infrared excitation light source. The VRR technique revealed chemical composition changes of native biomarkers such as tryptophan, carotenoids, lipids and proteins.The VRR spectra from BCC samples showed a strong enhancement in proteins including collagen type I combined with amide I and amino acids, and a decrease in carotenoids and lipids. The PCA-SVM statistical analysis based on the molecular fingerprints of the biomarkers yielded a 93.0% diagnostic sensitivity, 100% specificity, and 94.5%accuracy compared with histopathology reports.Conclusion: VRR can enhance molecular vibrational modes of various native biomarkers to allow for very fast display of Raman modes in seconds. It may be used as a label-free molecular pathology method for diagnosis of skin cancer and other diseases and be used for combined treatment with Mohs surgery for BCC.展开更多
文摘The drug development process takes a long time since it requires sorting through a large number of inactive compounds from a large collection of compounds chosen for study and choosing just the most pertinent compounds that can bind to a disease protein.The use of virtual screening in pharmaceutical research is growing in popularity.During the early phases of medication research and development,it is crucial.Chemical compound searches are nowmore narrowly targeted.Because the databases containmore andmore ligands,thismethod needs to be quick and exact.Neural network fingerprints were created more effectively than the well-known Extended Connectivity Fingerprint(ECFP).Only the largest sub-graph is taken into consideration to learn the representation,despite the fact that the conventional graph network generates a better-encoded fingerprint.When using the average or maximum pooling layer,it also contains unrelated data.This article suggested the Graph Convolutional Attention Network(GCAN),a graph neural network with an attention mechanism,to address these problems.Additionally,it makes the nodes or sub-graphs that are used to create the molecular fingerprint more significant.The generated fingerprint is used to classify drugs using ensemble learning.As base classifiers,ensemble stacking is applied to Support Vector Machines(SVM),Random Forest,Nave Bayes,Decision Trees,AdaBoost,and Gradient Boosting.When compared to existing models,the proposed GCAN fingerprint with an ensemble model achieves relatively high accuracy,sensitivity,specificity,and area under the curve.Additionally,it is revealed that our ensemble learning with generated molecular fingerprint yields 91%accuracy,outperforming earlier approaches.
文摘Medulloblastoma is the most common malignant pediatric brain tumor. In mice, Ptcl haploinsufficiency and disruption of DNA repair (DNA ligase IV inactivation) or cell cycle regulation (Kipl, Ink4d, or Inkd.c inactivation), in conjunction with p53 dysfunction, predispose to medulloblastoma. To identify genes important for this tumor, we evaluated gene expression profiles in medulloblastomas from these mice. Unexpectedly, medulloblastoma
基金National Natural Science Foundation of China(Nos.21978058 and 21676094)the Pearl River Talent Recruitment Program,China(No.2019QN01L255)+1 种基金the Natural Science Foundation of Guangdong Province,China(No.2020A1515010800)the Guangzhou Municipal Science and Technology Project,China(No.202102020875)for the financial support.
文摘The capture of trace amounts of non-methane hydrocarbons(NMHCs)from air due to the toxicity of volatile organic compounds is a significant challenge.A total of 31399 hydrophobic metal–organic frameworks(MOFs)were first screened from 137953 hypothetical MOFs using high-throughput computational screening(HTCS),and their performance indices(adsorption capacity and selectivity)for the adsorption of NMHCs(C_(3)–C_(6))were obtained by molecular simulations.The discovery of a“second peak”near twice the kinetic diameter of the corresponding NMHC provided more choices for excellent MOFs that adsorb NMHCs.Four machine learning(ML)classification and regression algorithms predicted the performance of MOFs,and the relative importance values of the six descriptors were determined.The combination of the Random Forests algorithm and Molecular ACCess Systems molecular fingerprint(MF)had an excellent predictive ability for MOFs.According to the performance,the fingerprint commonalities of the 100 top-performing MOFs were counted,and the excellent bits(EBs)that could promote the performance were defined.Finally,new substructures containing all of the EBs were designed for each NMHC to build a new MOF database.This work combined the HTCS,ML,and MF to provide a detailed insight into the design of efficient MOFs for adsorbing NMHCs.
基金Lanzhou Talent Innovation and Entrepreneurship Project(No.2020-RC-14)。
文摘Single nucletide polymorphism(SNP)is an important factor for the study of genetic variation in human families and animal and plant strains.Therefore,it is widely used in the study of population genetics and disease related gene.In pharmacogenomics research,identifying the association between SNP site and drug is the key to clinical precision medication,therefore,a predictive model of SNP site and drug association based on denoising variational auto-encoder(DVAE-SVM)is proposed.Firstly,k-mer algorithm is used to construct the initial SNP site feature vector,meanwhile,MACCS molecular fingerprint is introduced to generate the feature vector of the drug module.Then,we use the DVAE to extract the effective features of the initial feature vector of the SNP site.Finally,the effective feature vector of the SNP site and the feature vector of the drug module are fused input to the support vector machines(SVM)to predict the relationship of SNP site and drug module.The results of five-fold cross-validation experiments indicate that the proposed algorithm performs better than random forest(RF)and logistic regression(LR)classification.Further experiments show that compared with the feature extraction algorithms of principal component analysis(PCA),denoising auto-encoder(DAE)and variational auto-encode(VAE),the proposed algorithm has better prediction results.
基金supported by the National Natural Science Foundation of China(No.41703101)the Beijing Outstanding Young Scientist Program(No.BJJWZYJH01201910004016)。
文摘Carbonate radical is among the most important environmental relevant reactive species which govern the transformation and fate of pharmaceutical contaminants(PCs).However,reaction rate constants between carbonate radical and most of the PCs have not been experimentally determined,and quantitative structural-activity relationships(QSARs)have not been established for rate estimation.This study applied Max Min data processing method and used molecular fingerprints(MF)as the input of a deep neural network(DNN)to predict the rate constants between carbonate radical and organic compounds.MF parameters and the hyper-structure of the DNN were adjusted to yield satisfactory accuracy of rate prediction.The vector length of 512 bits with radius of 1 for MF and 5 hidden layers gave the best performance.The optimized MaxMin-MF-DNN model was compared with some of the most commonly used QSARs and machine learning methods,including random data splitting,molecular descriptors,supporting vector machine,decision tree,etc.Results showed that the MF-DNN model out-performed the other methods by more than 10%increase in prediction accuracy.Applying this MF-DNN model,we estimated reaction rates between carbonate radical and pharmaceuticals used in human medicine(1576)and veterinary practice(390).Among them,46 drugs were identified as fast-reacting compounds,suggesting the important relations of their environmental fate with carbonate radical.
文摘Identifying the association between metabolites and diseases will help us understand the pathogenesis of diseases,which has great significance in diagnosing and treating diseases.However,traditional biometric methods are time consuming and expensive.Accordingly,we propose a new metabolite-disease association prediction algorithm based on DeepWalk and random forest(DWRF),which consists of the following key steps:First,the semantic similarity and information entropy similarity of diseases are integrated as the final disease similarity.Similarly,molecular fingerprint similarity and information entropy similarity of metabolites are integrated as the final metabolite similarity.Then,DeepWalk is used to extract metabolite features based on the network of metabolite-gene associations.Finally,a random forest algorithm is employed to infer metabolite-disease associations.The experimental results show that DWRF has good performances in terms of the area under the curve value,leave-one-out cross-validation,and five-fold cross-validation.Case studies also indicate that DWRF has a reliable performance in metabolite-disease association prediction.
基金This preliminary work was supported in part by a seed grant from Sinai hospital of Detroit medical staff foundation
文摘Aim: The aim of the study is to test visible resonance Raman (VRR) spectroscopy for rapid skin cancer diagnosis,and evaluate its effectiveness as a new optical biopsy method to distinguish basal cell carcinoma (BCC) from normal skin tissues.Methods: The VRR spectroscopic technique was undertaken using 532 nm excitation. Normal and BCC human skin tissue samples were measured in seconds. The molecular fingerprints of various native biomolecules as biomarkers were analyzed. A principal component analysis - support vector machine (PCA-SVM) statistical analysis method based on the molecular fingerprints was developed for differentiating BCC from normal skin tissues.Results: VRR provides a rapid method and enhanced Raman signals from biomolecules with resonant and nearresonant absorption bands as compared with using a near-infrared excitation light source. The VRR technique revealed chemical composition changes of native biomarkers such as tryptophan, carotenoids, lipids and proteins.The VRR spectra from BCC samples showed a strong enhancement in proteins including collagen type I combined with amide I and amino acids, and a decrease in carotenoids and lipids. The PCA-SVM statistical analysis based on the molecular fingerprints of the biomarkers yielded a 93.0% diagnostic sensitivity, 100% specificity, and 94.5%accuracy compared with histopathology reports.Conclusion: VRR can enhance molecular vibrational modes of various native biomarkers to allow for very fast display of Raman modes in seconds. It may be used as a label-free molecular pathology method for diagnosis of skin cancer and other diseases and be used for combined treatment with Mohs surgery for BCC.