Astronomical knowledge entities,such as celestial object identifiers,are crucial for literature retrieval and knowledge graph construction,and other research and applications in the field of astronomy.Traditional meth...Astronomical knowledge entities,such as celestial object identifiers,are crucial for literature retrieval and knowledge graph construction,and other research and applications in the field of astronomy.Traditional methods of extracting knowledge entities from texts face numerous challenging obstacles that are difficult to overcome.Consequently,there is a pressing need for improved methods to efficiently extract them.This study explores the potential of pre-trained Large Language Models(LLMs)to perform astronomical knowledge entity extraction(KEE)task from astrophysical journal articles using prompts.We propose a prompting strategy called PromptKEE,which includes five prompt elements,and design eight combination prompts based on them.We select four representative LLMs(Llama-2-70B,GPT-3.5,GPT-4,and Claude 2)and attempt to extract the most typical astronomical knowledge entities,celestial object identifiers and telescope names,from astronomical journal articles using these eight combination prompts.To accommodate their token limitations,we construct two data sets:the full texts and paragraph collections of 30 articles.Leveraging the eight prompts,we test on full texts with GPT-4and Claude 2,on paragraph collections with all LLMs.The experimental results demonstrate that pre-trained LLMs show significant potential in performing KEE tasks,but their performance varies on the two data sets.Furthermore,we analyze some important factors that influence the performance of LLMs in entity extraction and provide insights for future KEE tasks in astrophysical articles using LLMs.Finally,compared to other methods of KEE,LLMs exhibit strong competitiveness in multiple aspects.展开更多
Potato late blight,which is caused by Phytophthorainfestans(Mont.)de Bary,is a worldwide devastating disease for potato.It decreased yields of potato and caused unpredictable losses all over the world.Various simple s...Potato late blight,which is caused by Phytophthorainfestans(Mont.)de Bary,is a worldwide devastating disease for potato.It decreased yields of potato and caused unpredictable losses all over the world.Various simple statistical methods and forecasting models have been developed to predict and manage potato late blight.Meanwhile,there is a rising need to develop prediction models reflecting peroxidase(POD)activity,which is an important health index that varies with infection and correlated with stress resistance in plants.Thus,the aim of this research was to develop kinetic models to predict POD activity.Infection-induced changes in potato leaves stored in an artificial climate chest at 25°C were analyzed using hyperspectroscopy.Four prediction models were developed by using linear partial least squares(PLS)and nonlinear support vector machine(SVM)methods based on the full spectrum and effective wavelengths.The effective wavelengths were selected by the successive projection algorithm(SPA).In this study,the prediction model developed by means of SPA-SVM method obtained the best performance,with a Rp(correlation coefficient of prediction)value of 0.923 and a RMSEp(root mean square error of prediction)value of 24.326.Five-order kinetics models according to the prediction model were developed,and late blight disease can be predicted using this model.This study provided a theoretical basis for the prediction of latencies of late blight.展开更多
Functional genomics employs dozens of OMICs technologies to explore the functions of DNA, RNA and protein regulators in gene regulation processes. Despite each of these technologies being powerful tools on their own, ...Functional genomics employs dozens of OMICs technologies to explore the functions of DNA, RNA and protein regulators in gene regulation processes. Despite each of these technologies being powerful tools on their own, fike the parable of blind men and an elephant, any one single technology has a limited ability to depict the complex regulatory system. Integrative OMICS approaches have emerged and become an important area in biology and medicine. It provides a precise and effective way to study gene regulations. Results: This article reviews current popular OMICs technologies, OMICs data integration strategies, and bioinformatics tools used for multi-dimensional data integration. We highlight the advantages of these methods, particularly in elucidating molecular basis of biological regulatory mechanisms. Conclusions: To better understand the complexity of biological processes, we need powerful bioinformatics tools to integrate these OMICs data. Integrating multi-dimensional OMICs data will generate novel insights into system-level gene regulations and serves as a foundation for further hypothesis-driven research.展开更多
基金supported by the National Natural Science Foundation of China(NSFC,Grant Nos.12273077,72101068,12373110,and 12103070)National Key Research and Development Program of China under grants(2022YFF0712400,2022YFF0711500)+2 种基金the 14th Five-year Informatization Plan of Chinese Academy of Sciences(CAS-WX2021SF-0204)supported by Astronomical Big Data Joint Research Centerco-founded by National Astronomical Observatories,Chinese Academy of Sciences and Alibaba Cloud。
文摘Astronomical knowledge entities,such as celestial object identifiers,are crucial for literature retrieval and knowledge graph construction,and other research and applications in the field of astronomy.Traditional methods of extracting knowledge entities from texts face numerous challenging obstacles that are difficult to overcome.Consequently,there is a pressing need for improved methods to efficiently extract them.This study explores the potential of pre-trained Large Language Models(LLMs)to perform astronomical knowledge entity extraction(KEE)task from astrophysical journal articles using prompts.We propose a prompting strategy called PromptKEE,which includes five prompt elements,and design eight combination prompts based on them.We select four representative LLMs(Llama-2-70B,GPT-3.5,GPT-4,and Claude 2)and attempt to extract the most typical astronomical knowledge entities,celestial object identifiers and telescope names,from astronomical journal articles using these eight combination prompts.To accommodate their token limitations,we construct two data sets:the full texts and paragraph collections of 30 articles.Leveraging the eight prompts,we test on full texts with GPT-4and Claude 2,on paragraph collections with all LLMs.The experimental results demonstrate that pre-trained LLMs show significant potential in performing KEE tasks,but their performance varies on the two data sets.Furthermore,we analyze some important factors that influence the performance of LLMs in entity extraction and provide insights for future KEE tasks in astrophysical articles using LLMs.Finally,compared to other methods of KEE,LLMs exhibit strong competitiveness in multiple aspects.
基金This research was supported by the Natural Science Foundation of China(31671965)the project of Key Laboratory of Agricultural Internet of Things,Ministry of Agriculture,China(2017001).
文摘Potato late blight,which is caused by Phytophthorainfestans(Mont.)de Bary,is a worldwide devastating disease for potato.It decreased yields of potato and caused unpredictable losses all over the world.Various simple statistical methods and forecasting models have been developed to predict and manage potato late blight.Meanwhile,there is a rising need to develop prediction models reflecting peroxidase(POD)activity,which is an important health index that varies with infection and correlated with stress resistance in plants.Thus,the aim of this research was to develop kinetic models to predict POD activity.Infection-induced changes in potato leaves stored in an artificial climate chest at 25°C were analyzed using hyperspectroscopy.Four prediction models were developed by using linear partial least squares(PLS)and nonlinear support vector machine(SVM)methods based on the full spectrum and effective wavelengths.The effective wavelengths were selected by the successive projection algorithm(SPA).In this study,the prediction model developed by means of SPA-SVM method obtained the best performance,with a Rp(correlation coefficient of prediction)value of 0.923 and a RMSEp(root mean square error of prediction)value of 24.326.Five-order kinetics models according to the prediction model were developed,and late blight disease can be predicted using this model.This study provided a theoretical basis for the prediction of latencies of late blight.
基金Our work was supported by a Direct Grant for Research from The Chinese University of Hong Kong, Hong Kong SAR, China (No. 4053150) to JQ, research grants from Research Grants Council, Hong Kong SAR, China (No. 17121414M), the National Natural Science Foundation of China (Nos. 81572786 and 91529303), startup funds from Mayo Clinic (Mayo Clinic Arizona and Center for Individualized Medicine) to JW, and the National Natural Science Foundation of China (No. 11526144) and the Natural Science Foundation of Guangdong (No. 2016A030310038) to YH.
文摘Functional genomics employs dozens of OMICs technologies to explore the functions of DNA, RNA and protein regulators in gene regulation processes. Despite each of these technologies being powerful tools on their own, fike the parable of blind men and an elephant, any one single technology has a limited ability to depict the complex regulatory system. Integrative OMICS approaches have emerged and become an important area in biology and medicine. It provides a precise and effective way to study gene regulations. Results: This article reviews current popular OMICs technologies, OMICs data integration strategies, and bioinformatics tools used for multi-dimensional data integration. We highlight the advantages of these methods, particularly in elucidating molecular basis of biological regulatory mechanisms. Conclusions: To better understand the complexity of biological processes, we need powerful bioinformatics tools to integrate these OMICs data. Integrating multi-dimensional OMICs data will generate novel insights into system-level gene regulations and serves as a foundation for further hypothesis-driven research.