The FAIR principles describe characteristics intended to support access to and reuse of digital artifacts in the scientific research ecosystem.Persistent,globally unique identifiers,resolvable on the Web,and associate...The FAIR principles describe characteristics intended to support access to and reuse of digital artifacts in the scientific research ecosystem.Persistent,globally unique identifiers,resolvable on the Web,and associated with a set of additional descriptive metadata,are foundational to FAIR data.Here we describe some basic principles and exemplars for their design,use and orchestration with other system elements to achieve FAIRness for digital research objects.展开更多
Computational workflows describe the complex multi-step methods that are used for data collection,data preparation,analytics,predictive modelling,and simulation that lead to new data products.They can inherently contr...Computational workflows describe the complex multi-step methods that are used for data collection,data preparation,analytics,predictive modelling,and simulation that lead to new data products.They can inherently contribute to the FAIR data principles:by processing data according to established metadata;by creating metadata themselves during the processing of data;and by tracking and recording data provenance.These properties aid data quality assessment and contribute to secondary data usage.Moreover,workflows are digital objects in their own right.This paper argues that FAIR principles for workflows need to address their specific nature in terms of their composition of executable software steps,their provenance,and their development.展开更多
We introduce the concept of Canonical Workflow Building Blocks(CWBB),a methodology of describing and wrapping computational tools,in order for them to be utilised in a reproducible manner from multiple workflow langua...We introduce the concept of Canonical Workflow Building Blocks(CWBB),a methodology of describing and wrapping computational tools,in order for them to be utilised in a reproducible manner from multiple workflow languages and execution platforms.The concept is implemented and demonstrated with the BioExcel Building Blocks library(BioBB),a collection of tool wrappers in the field of computational biomolecular simulation.Interoperability across different workflow languages is showcased through a protein Molecular Dynamics setup transversal workflow,built using this library and run with 5 different Workflow Manager Systems(WfMS).We argue such practice is a necessary requirement for FAIR Computational Workflows and an element of Canonical Workflow Frameworks for Research(CWFR)in order to improve widespread adoption and reuse of computational methods across workflow language barriers.展开更多
A key limiting factor in organising and using information from physical specimens curated in natural science collections is making that information computable,with institutional digitization tending to focus more on i...A key limiting factor in organising and using information from physical specimens curated in natural science collections is making that information computable,with institutional digitization tending to focus more on imaging the specimens themselves than on efficiently capturing computable data about them.Label data are traditionally manually transcribed today with high cost and low throughput,rendering such a task constrained for many collection-holding institutions at current funding levels.We show how computer vision,optical character recognition,handwriting recognition,named entity recognition and language translation technologies can be implemented into canonical workflow component libraries with findable,accessible,interoperable,and reusable(FAIR)characteristics.These libraries are being developed in a cloudbased workflow plaform-the Specimen Data Refinery'(SDR)-founded on Galaxy workflow engine,Common Workflow Language,Research Object Crates(RO-Crate)and WorkflowHub technologies.The SDR can be applied to specimens'labels and other artefacts,offering the prospect of greatly accelerated and more accurate data capture in computable form.Two kinds of FAIR Digital Objects(FDO)are created by packaging outputs of SDR workflows and workflow components as digital objects with metadata,a persistent identifier,and a specific type definition.The first kind of FDO are computable Digital Specimen(DS)objects that can be consumed/produced by workflows,and other applications.A single DS is the input data structure submitted to a workflow that is modified by each workflow component in turn to produce a refined DS at the end.The Specimen Data Refinery provides a library of such components that can be used individually,or in series.To cofunction,each library component describes the fields it requires from the DS and the fields it will in turn populate or enrich.The second kind of FDO,RO-Crates gather and archive the diverse set of digital and real-world resources,configurations,and actions(the provenance)contributing to a unit of research work,allowing that work to be faithfully recorded and reproduced.Here we describe the Specimen Data Refinery with its motivating requirements,focusing on what is essential in the creation of canonical workflow component libraries and its conformance with the requirements of an emerging FDO Core Specification being developed by the FDO Forum.展开更多
基金This work was supported in part by the European Union’s Horizon 2020 program under grant agreements 777523,FREYA“Connected Open Identifiers for Discovery,Access and Use of Research Resources”,654248,CORBEL+1 种基金“Coordinated Research Infrastructures Building Enduring Life-science services”,and 823830Bioexcel2,"BioExcel-2 Centre of Excellence for Computational Biomolecular Research".Many thanks to Paul Groth for his helpful comments on the manuscript.
文摘The FAIR principles describe characteristics intended to support access to and reuse of digital artifacts in the scientific research ecosystem.Persistent,globally unique identifiers,resolvable on the Web,and associated with a set of additional descriptive metadata,are foundational to FAIR data.Here we describe some basic principles and exemplars for their design,use and orchestration with other system elements to achieve FAIRness for digital research objects.
基金Carole Goble acknowledges funding by BioExcel2(H2020823830)IBISBA1.0(H2020730976)and EOSCLife(H2020824087)+3 种基金Daniel Schober’s work was financed by Phenomenal(H2020654241)at the initiation-phase of this effort,current work in kind contributionKristian Peters is funded by the German Network for Bioinformatics Infrastructure(de.NBI)and acknowledges BMBF funding under grant number 031L0107Stian Soiland-Reyes is funded by BioExcel2(H2020823830)Daniel Garijo,Yolanda Gil,gratefully acknowledge support from DARPA award W911NF-18-1-0027,NIH award 1R01AG059874-01,and NSF award ICER-1740683.
文摘Computational workflows describe the complex multi-step methods that are used for data collection,data preparation,analytics,predictive modelling,and simulation that lead to new data products.They can inherently contribute to the FAIR data principles:by processing data according to established metadata;by creating metadata themselves during the processing of data;and by tracking and recording data provenance.These properties aid data quality assessment and contribute to secondary data usage.Moreover,workflows are digital objects in their own right.This paper argues that FAIR principles for workflows need to address their specific nature in terms of their composition of executable software steps,their provenance,and their development.
基金a project funded by the European Union contracts H2020-INFRAEDI-02-2018823830,and H2020-EINFRA-2015-1675728funded through EOSC-Life(https://www.eosc-life.eu)contract H2020-INFRAEOSC-2018-2824087ELIXIR-CONVERGE(https://elixir-europe.org)contract H2020-INFRADEV-2019-2871075.
文摘We introduce the concept of Canonical Workflow Building Blocks(CWBB),a methodology of describing and wrapping computational tools,in order for them to be utilised in a reproducible manner from multiple workflow languages and execution platforms.The concept is implemented and demonstrated with the BioExcel Building Blocks library(BioBB),a collection of tool wrappers in the field of computational biomolecular simulation.Interoperability across different workflow languages is showcased through a protein Molecular Dynamics setup transversal workflow,built using this library and run with 5 different Workflow Manager Systems(WfMS).We argue such practice is a necessary requirement for FAIR Computational Workflows and an element of Canonical Workflow Frameworks for Research(CWFR)in order to improve widespread adoption and reuse of computational methods across workflow language barriers.
基金funding from the European Union's Horizon 2020 research and innovation programme under grant agreement numbers 823827(SYNTHESYS Plus),871043(DisSCo Prepare),823830(BioExcel-2),824087(EOSC-Life).
文摘A key limiting factor in organising and using information from physical specimens curated in natural science collections is making that information computable,with institutional digitization tending to focus more on imaging the specimens themselves than on efficiently capturing computable data about them.Label data are traditionally manually transcribed today with high cost and low throughput,rendering such a task constrained for many collection-holding institutions at current funding levels.We show how computer vision,optical character recognition,handwriting recognition,named entity recognition and language translation technologies can be implemented into canonical workflow component libraries with findable,accessible,interoperable,and reusable(FAIR)characteristics.These libraries are being developed in a cloudbased workflow plaform-the Specimen Data Refinery'(SDR)-founded on Galaxy workflow engine,Common Workflow Language,Research Object Crates(RO-Crate)and WorkflowHub technologies.The SDR can be applied to specimens'labels and other artefacts,offering the prospect of greatly accelerated and more accurate data capture in computable form.Two kinds of FAIR Digital Objects(FDO)are created by packaging outputs of SDR workflows and workflow components as digital objects with metadata,a persistent identifier,and a specific type definition.The first kind of FDO are computable Digital Specimen(DS)objects that can be consumed/produced by workflows,and other applications.A single DS is the input data structure submitted to a workflow that is modified by each workflow component in turn to produce a refined DS at the end.The Specimen Data Refinery provides a library of such components that can be used individually,or in series.To cofunction,each library component describes the fields it requires from the DS and the fields it will in turn populate or enrich.The second kind of FDO,RO-Crates gather and archive the diverse set of digital and real-world resources,configurations,and actions(the provenance)contributing to a unit of research work,allowing that work to be faithfully recorded and reproduced.Here we describe the Specimen Data Refinery with its motivating requirements,focusing on what is essential in the creation of canonical workflow component libraries and its conformance with the requirements of an emerging FDO Core Specification being developed by the FDO Forum.