20 GB in 10 minutes: a case for linking major biodiversity databases using an open socio-technical infrastructure and a pragmatic, cross-institutional collaboration
Biological database
DOI:
10.7717/peerj-cs.164
Publication Date:
2018-09-17T08:43:59Z
AUTHORS (4)
ABSTRACT
Biodiversity information is made available through numerous databases that each have their own data models, web services, and types. Combining across leads to new insights, but not easy because database uses its system of identifiers. In the absence stable interoperable identifiers, are often linked using taxonomic names. This labor intensive, error prone, lengthy process relies on accessible versions nomenclatural authorities fuzzy-matching algorithms. To approach challenge linking diverse data, more than technology needed. New social collaborations like Global Unified Open Data Architecture (GUODA) combines skills from groups computer engineers iDigBio, server resources Advanced Computing Information Systems (ACIS) Lab, global-scale presentation EOL, independent developers researchers what needed make concrete progress finding relationships between biodiversity datasets. paper will discuss a technical solution developed by GUODA collaboration for faster with use case Wikidata Biotic Interactions (GloBI). The infrastructure 12-node, high performance computing cluster up about 192 threads 12 TB storage 288 GB memory. Using GUODA, 20 compressed JSON was processed GloBI in 10–11 min. Instead comparing name strings or relying single identifier, were graphs identifiers external system. method resulted adding 119,957 links GloBI, an increase 13.7% all outgoing GloBI. compared Tree Life Reference Taxonomy examine consistency coverage. parsing Wikidata, archives calculating metrics done minutes platform. As model collaboration, has potential revolutionize science bringing technically minded people together laptop desktop. However, participating such still requires basic programming skills.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (25)
CITATIONS (4)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....