NFDI4DS | UHH-SEMS - Publication Details

Integrating and visualizing primary data from prospective and legacy taxonomic literature

0106 biological sciences 070 QH301-705.5 XML markup Spiders Open access XML 15. Life on land 01 natural sciences Biodiversity informatics markup Araneae General Research Article Biology (General) Data mining Taxonomy

DOI: 10.3897/bdj.3.e5063 Publication Date: 2015-05-12T16:08:49Z

Abstract Supplemental Material References Cited by

AUTHORS (11)

Jeremy Miller

Donat Agosti

Lyubomir Penev

Guido Sautter

Teodor Georgiev

Terry Catapano

David Patterson

David King

Serrano Pereira

Rutger Vos

Soraya Sierra

ABSTRACT

Specimen data in taxonomic literature are among the highest quality primary biodiversity data. Innovative cybertaxonomic journals are using workflows that maintain data structure and disseminate electronic content to aggregators and other users; such structure is lost in traditional taxonomic publishing. Legacy taxonomic literature is a vast repository of knowledge about biodiversity. Currently, access to that resource is cumbersome, especially for non-specialist data consumers. Markup is a mechanism that makes this content more accessible, and is especially suited to machine analysis. Fine-grained XML (Extensible Markup Language) markup was applied to all (37) open-access articles published in the journal Zootaxa containing treatments on spiders (Order: Araneae). The markup approach was optimized to extract primary specimen data from legacy publications. These data were combined with data from articles containing treatments on spiders published in Biodiversity Data Journal where XML structure is part of the routine publication process. A series of charts was developed to visualize the content of specimen data in XML-tagged taxonomic treatments, either singly or in aggregate. The data can be filtered by several fields (including journal, taxon, institutional collection, collecting country, collector, author, article and treatment) to query particular aspects of the data. We demonstrate here that XML markup using GoldenGATE can address the challenge presented by unstructured legacy data, can extract structured primary biodiversity data which can be aggregated with and jointly queried with data from other Darwin Core-compatible sources, and show how visualization of these data can communicate key information contained in biodiversity literature. We complement recent studies on aspects of biodiversity knowledge using XML structured data to explore 1) the time lag between species discovry and description, and 2) the prevelence of rarity in species descriptions.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (71)

CITATIONS (19)

EXTERNAL LINKS

CROSSREF - Publications OPENAIRE - Products

PlumX Metrics

Integrating and visualizing primary data from prospective and legacy taxonomic literature

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....