Conceptual design blueprint for the DiSSCo digitization infrastructure - DELIVERABLE D8.1

Digitization Petabyte Blueprint Cyberinfrastructure Terabyte
DOI: 10.3897/rio.6.e54280 Publication Date: 2020-05-18T10:00:15Z
ABSTRACT
DiSSCo, the Distributed System of Scientific Collections, is a pan-European Research Infrastructure (RI) mobilising, unifying bio- and geo-diversity information connected to specimens held in natural science collections delivering it scientific communities beyond. Bringing together 120 institutions across 21 countries combining earlier investments data interoperability practices with technological advancements digitisation, cloud services semantic linking, DiSSCo makes from available as one virtual cloud, emerging new techniques not already linked specimens. These include DNA barcodes, whole genome sequences, proteomics metabolomics data, chemical trait imaging (Computer-assisted Tomography (CT), Synchrotron, etc.), name but few; will lead wide range end-user that begins finding, accessing, using improving data. deliver diagnostic required for novel approaches transform landscape what possible ways are hard imagine today. With approximately 1.5 billion objects be digitised, bringing age expected result many tens petabytes over next decades, used on average by 5,000 – 15,000 unique users every day. This requires skills, clear policies robust procedures technologies create, work manage large digital datasets their entire research lifecycle, including long-term storage preservation open access. Such processes must match derived latest thinking management, realising core principles 'findable, accessible, interoperable reusable' (FAIR). Synthesised results ICEDIG project ("Innovation Consolidation Large Scale Digitisation Natural Heritage", EU Horizon 2020 grant agreement No. 777483) Conceptual Design Blueprint covers organisational arrangements, practices, architecture, tools technologies, culture, skills capacity building governance business model proposals constructing digitisation infrastructure DiSSCo. In this context, interpreted (machinery, processing, procedures, personnel, organisation) offering Europe-wide capabilities mass digitisation-on-demand, subsequent management (i.e., curation, publication, processing) use resulting The blueprint constitutes essential background needed continue raise overall maturity Programme multiple dimensions (organisational, technical, scientific, financial) achieve readiness begin construction. Today, collection efforts have reached most collection-holding Europe. Much leadership people involved working wish take steps forward expand benefit further noticeable positive effects. collective examining financial, policy aspects show way operating distributed initiative i.e., Collections (DiSSCo) Ample examples, opportunities need innovation consolidation scale heritage been described. hundred four (104) recommendations considered other elements projects SYNTHESYS+, COST MOBILISE, Prepare, others follow) journey towards organisational, financial continues. Nevertheless, significant obstacles overcome matter priority if move beyond its Preparatory Phases during 2024. Specifically, these include: Organisational: Strengthen common purpose adopting framework harmonisation enhancement broad areas, especially respect strategy prioritisation, techniques, media publication access, protection access sensitive administration sharing. Pursue joint ventures relationships necessary successful delivery mission, GBIF international regional aggregation organisations, context frameworks, such EOSC. Proceed explicit aim avoiding divergences approach global research. Technical: Adopt enhance Digital Specimen Architecture and, specifically urgency, establish persistent identifier scheme (ideally) comparable initiatives. Establish (software) engineering development (infrastructure) operations team direction functionalities earnest can an early start operations. Scientific: agenda leveraging (extended) Specimens anchoring points all specimen-associated -derived information, demonstrating policy/decision-makers possibilities, value participating infrastructure. Data: FAIR Object Framework International Image Interoperability low entropy means achieving uniform rich (image non-image) findable, reusable Develop promote best practice terms quality (best, according agreed minimum specifications), time (highest throughput, fast), cost (lowest, minimal per specimen). Financial Broaden attractiveness improve bankability) invest in. Plan finding bridge funding gap avoid disruptions critical path risks interrupting operations; when opens between end preparations beginning implementation due unsolved political difficulties. Strategically, vital balance factors addressed against another desired goals programme. Decisions cannot taken aspect alone without considering aspects, here various structures (General Assembly, advisory boards, stakeholder forums) play role coming years.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (58)
CITATIONS (23)