FAIR registry
NFDI4DS
Reproducibility
DOI:
10.5281/zenodo.7129715
Publication Date:
2022-09-30
AUTHORS (6)
ABSTRACT
NFDI4DataScience registry for reproducible Data Science and Artificial Intelligence Scientific advances are built on previous work, and for it, all the involved pieces are needed, i.e., goals, methods, results (usually reported as a scholarly publication), data, software and any other Digital Object (DO) used in the research process. The Findable, Accessible, Interoperable and Reusable (FAIR) Principles provide a metadata-based approach for research to improve on these four dimensions. Although aiming to cover all sorts of (research) DOs, the FAIR principles mainly focus on data. Additional efforts to adjust and extend their coverage to software, workflows and machine learning have been established in the last few years. Despite improvements brought by FAIRification efforts, e.g., DOs are more findable nowadays, there are scientific desirable aspects, such as reproducibility, which are beyond the scope of FAIR and still pose a challenge. In the case of Data Science (DS) and Artificial Intelligence (AI), the discussion around FAIR, reproducibility and other *ilities is a recent ongoing effort. The (German) National Research Data Infrastructure for Data Science consortium (NFDI4DS) brings together 16 partners aiming at creating an infrastructure to support interdisciplinary research involving DS and AI. In particular, the consortium will create a registry encompassing metadata for a variety of DOs involved in AI approaches, including data and its configuration for a particular approach, software used to process the data together with its hyper-parametrization, underlying AI model, evaluation process and additional elements supporting benchmarking, comparability and reproducibility. The registry will build on top of FAIR metadata models for research data and software and the notion of FAIR DOs (FDOs). FDOs have been proposed to improve access to DOs thanks to the formalization of their metadata, types, identifiers and the explicit declaration of their computational operations, making them actionable FAIR objects. We plan to use Research Objects Crates to package research outputs along with their metadata and turn them into FDOs; implementation allows for a broad range of use cases, across scientific domains. As for the AI tailored metadata, the registry will formalize and extend the Data, Optimization, Model, and Evaluation recommendations for reporting supervised machine learning on computational biology to cover further cases and disciplines. As efforts around FAIR and reproducibility for AI are still recent, and the NFDI4DS has just started this year 2022, there are open questions including what aspects of reproducibility can be tackled with metadata, what dimensions can be used to compare AI approaches (possibly) across disciplines, and how much metadata can be obtained by automatic means. In this workshop, we want to promote an open discussion with the community around these topics.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....