A new semi-automated workflow for chemical data retrieval and quality checking for modeling applications

Cheminformatics Robustness
DOI: 10.1186/s13321-018-0315-6 Publication Date: 2018-12-10T14:35:53Z
ABSTRACT
The quality of data used for QSAR model derivation is extremely important as it strongly affects the final robustness and predictive power model. Ambiguous or wrong structures need to be carefully checked, because they lead errors in calculation descriptors, hence leading meaningless results. increasing amounts data, however, have often made hard check very large databases manually. In light this, we designed implemented a semi-automated workflow integrating structural retrieval from several web-based databases, automated comparison these chemical structure cleaning, selection standardization into consistent, ready-to-use format that can employed modeling. integrates best practices curation been suggested recent literature. has with freely available KNIME software cheminformatics community improvement application broad range datasets.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (37)
CITATIONS (57)