摘要
In this paper,we describe the application of Kaphta architecture,a resource for text mining of the anticancer activity of polyphenols.The anticancer activity of these compounds against different types of cancer has been widely reported in the literature and they are one of the most promising molecules for the development of anticancer drugs.The architecture,which comprises four sequential and well-defined steps,uses a hybrid approach composed of a dictionary,rules and machine learning to identify abstracts containing sentences with associations between polyphenol,cancer and gene entities.The application of the architecture on 23826 PubMed abstracts generated a knowledge base of indexed abstracts with 172169 sentences containing,polyphenol-cancer and polyphenol-gene associations.A Web tool was implemented that allowed the user to search for information on 2006 polyphenols,240 cancers and 3121 genes entities,and 11750 polyphenol-cancer and 9160 polyphenol-gene associations indexed in the knowledge base.A ranking algorithm calculates scores for each indexed abstract considering the number and type of sentences with entities and rules recognized.A test with users demonstrated that the visualization resources on the web tool contributes to the understanding of the association between polyphenols,genes and cancers,in comparison with the PubMed Tool.The Kaphta architecture and web tool permits to extract knowledge on the anticancer activity of polyphenols and can thus contribute to the exploration of these molecules in the development of anticancer therapies.
基金
supported by Biotechnology Unit,Universidade de Ribeirão Preto,Brazil
Federal Institute of Education,Science and Technology of South of Minas Gerais-IFSULDEMINAS,Brazil
São Paulo Research Foundation(FAPESP)[grant n.17/03237-2].