Using Machine-Learning to Facilitate Data Extraction for Human Health Chemical Assessments: Protocol for a case application
Data extraction
DOI:
10.5281/zenodo.8418719
Publication Date:
2023-10-08
AUTHORS (16)
ABSTRACT
Artificial intelligence (AI) methods including natural language processing, active learning, and large language models are expected to provide workflow advances to reduce risk assessors' time and effort while maintaining the accuracy necessary to meet demand for chemical assessments. A growing suite of modular software applications that integrate AI methods and leverage human-in-the-loop workflows are making operationalization of these advancements feasible. The case application in this protocol supports development of a Provisional Peer-Reviewed Toxicity Value (PPRTV) assessment for 1,3-dinitrobenzene (1,3-DNB). The protocol describes methods to develop a literature inventory and systematic evidence map (SEM) for 1,3-DNB. Along with typical systematic review methods, the protocol applies an active learning approach to screen records at the title and abstract level using AI methods. While active learning has been a routine method used to reduce the resources required to screen records at the title and abstract level, automated processes for data extraction with user verification have evolved slowly. The slow evolution of AI for data extraction continues to be a challenge primarily because the resources required to develop appropriate training datasets for model development are limited, leading to immature models with poor performance, or the lack of models for many domain-specific data extraction fields. This protocol showcases how software applications like Dextr can be used to address both challenges with the potential to make progress toward a modern workflow stack including data extraction.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....