A certified de-identification system for all clinical text documents for information extraction at scale
Identification
Redaction
Protected health information
DOI:
10.1093/jamiaopen/ooad045
Publication Date:
2023-07-05T13:15:50Z
AUTHORS (8)
ABSTRACT
Clinical notes are a veritable treasure trove of information on patient's disease progression, medical history, and treatment plans, yet locked in secured databases accessible for research only after extensive ethics review. Removing personally identifying protected health (PII/PHI) from the records can reduce need additional Institutional Review Boards (IRB) reviews. In this project, our goals were to: (1) develop robust scalable clinical text de-identification pipeline that is compliant with Health Insurance Portability Accountability Act (HIPAA) Privacy Rule standards (2) share routinely updated de-identified researchers.Building open-source software called Philter, we added features make algorithm data HIPAA compliant, which also implies type 2 error-free redaction, as certified via external audit; over-redaction errors; (3) normalize shift date PHI. We established streamlined using MongoDB to automatically extract provide truly researchers periodic monthly refreshes at institution.To best knowledge, Philter V1.0 currently first certified, redaction makes available nonhuman subjects' research, without further IRB approval needed. To date, have made over 130 million 600 UCSF researchers. These collected past 40 years, represent 2757016 patients.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (17)
CITATIONS (29)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....