NFDI4DS | UHH-SEMS - Publication Details

Using Machine-Learning to Facilitate Data Extraction for Human Health Chemical Assessments: Protocol for a case application

Data extraction

DOI: 10.5281/zenodo.8418719 Publication Date: 2023-10-08

Abstract Supplemental Material References Cited by

AUTHORS (16)

Michelle Angrish

Kristina A. Thayer

Brittany Schulz

Artur Nowak

Amanda S. Persad

Allison L. Phillips

Glenn Rice

Teresa Shannon

A. Amina Wilkins

Krista Christensen

Elizabeth G. Radke

A. M. James Shapiro

Michele Taylor

Vickie R. Walker

Andrew A. Rooney

Sean Watford

ABSTRACT

Artificial intelligence (AI) methods including natural language processing, active learning, and large language models are expected to provide workflow advances to reduce risk assessors' time and effort while maintaining the accuracy necessary to meet demand for chemical assessments. A growing suite of modular software applications that integrate AI methods and leverage human-in-the-loop workflows are making operationalization of these advancements feasible. The case application in this protocol supports development of a Provisional Peer-Reviewed Toxicity Value (PPRTV) assessment for 1,3-dinitrobenzene (1,3-DNB). The protocol describes methods to develop a literature inventory and systematic evidence map (SEM) for 1,3-DNB. Along with typical systematic review methods, the protocol applies an active learning approach to screen records at the title and abstract level using AI methods. While active learning has been a routine method used to reduce the resources required to screen records at the title and abstract level, automated processes for data extraction with user verification have evolved slowly. The slow evolution of AI for data extraction continues to be a challenge primarily because the resources required to develop appropriate training datasets for model development are limited, leading to immature models with poor performance, or the lack of models for many domain-specific data extraction fields. This protocol showcases how software applications like Dextr can be used to address both challenges with the potential to make progress toward a modern workflow stack including data extraction.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Using Machine-Learning to Facilitate Data Extraction for Human Health Chemical Assessments: Protocol for a case application

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....