NFDI4DS | UHH-SEMS - Publication Details

Son Doan

ORCID: 0000-0002-7284-1306

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5038899864

Research Areas

Biomedical Text Mining and Ontologies
Data-Driven Disease Surveillance
Topic Modeling
Text and Document Classification Technologies
Advanced Text Analysis Techniques
Misinformation and Its Impacts
Semantic Web and Ontologies
Image Retrieval and Classification Techniques
Natural Language Processing Techniques
Wikis in Education and Collaboration
Sentiment Analysis and Opinion Mining
Electronic Health Records Systems
Rough Sets and Fuzzy Logic
Data Management and Algorithms
Mental Health via Writing
Data Mining Algorithms and Applications
Geographic Information Systems Studies
Algorithms and Data Compression
Zoonotic diseases and public health
Spam and Phishing Detection
Public Relations and Crisis Communication
Academic integrity and plagiarism
Ophthalmology and Visual Health Research
Pharmacovigilance and Adverse Drug Reactions
Pericarditis and Cardiac Tamponade

Société Française d'Allergologie
2020

National Institute of Informatics
2007-2019

Kaiser Permanente San Diego Medical Center
2019

Kaiser Permanente
2015-2018

University of California, San Diego
2012-2017

Southern California University for Professional Studies
2014

University of Southern California
2014

Vanderbilt University
2010

Vanderbilt University Medical Center
2009

Research Organization of Information and Systems
2008-2009

MedEx: a medication information extraction system for clinical narratives

OPENALEX - Publications

Hanzhang Xu Shane P. Stenner Son Doan K. Brandon Johnson Lemuel R. Waitman and 1 more

Medication information is one of the most important types clinical data in electronic medical records. It critical for healthcare safety and quality, as well research that uses record data. However, medication are often recorded notes free-text. As such, they not accessible to other computerized applications rely on coded We describe a new natural language processing system (MedEx), which extracts from notes. MedEx was initially developed using discharge summaries. An evaluation set 50...

10.1197/jamia.m3378 article EN Journal of the American Medical Informatics Association 2010-01-01

BioCaster: detecting public health rumors with a Web-based text mining system

OPENALEX - Publications

Nigel Collier Son Doan Ai Kawazoe Reiko Matsuda Goodwin Mike Conway and 7 more

Abstract Summary: BioCaster is an ontology-based text mining system for detecting and tracking the distribution of infectious disease outbreaks from linguistic signals on Web. The continuously analyzes documents reported over 1700 RSS feeds, classifies them topical relevance plots onto a Google map using geocoded information. background knowledge bridging gap between Layman's terms formal-coding systems contained in freely available ontology which includes information eight languages focused...

10.1093/bioinformatics/btn534 article EN Bioinformatics 2008-10-15

pSCANNER: patient-centered Scalable National Network for Effectiveness Research

OPENALEX - Publications

Lucila Ohno‐Machado Zia Agha Douglas S. Bell Lisa Dahm Michele E. Day and 58 more

This article describes the patient-centered Scalable National Network for Effectiveness Research (pSCANNER), which is part of recently formed PCORnet, a national network composed learning healthcare systems and patient-powered research networks funded by Patient Centered Outcomes Institute (PCORI). It designed to be stakeholder-governed federated that uses distributed architecture integrate data from three existing covering over 21 million patients in all 50 states: (1) VA Informatics...

10.1136/amiajnl-2014-002751 article EN cc-by-nc Journal of the American Medical Informatics Association 2014-04-30

Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use

OPENALEX - Publications

Nestor Alvaro Mike Conway Son Doan Christoph Lofi John P. Overington and 1 more

Self-reported patient data has been shown to be a valuable knowledge source for post-market pharmacovigilance. In this paper we propose using the popular micro-blogging service Twitter gather evidence about adverse drug reactions (ADRs) after firstly having identified micro-blog messages (also know as "tweets") that report first-hand experience. order achieve goal explore machine learning with crowdsourced from laymen annotators. With help of lay annotators recruited CrowdFlower manually...

10.1016/j.jbi.2015.11.004 article EN cc-by-nc-nd Journal of Biomedical Informatics 2015-11-07

Extracting health-related causality from twitter messages using natural language processing

OPENALEX - Publications

Son Doan Elly W. Yang Sameer Tilak Peter W. Li Daniel S. Zisook and 1 more

Twitter messages (tweets) contain various types of topics in our daily life, which include health-related topics. Analysis tweets would help us understand health conditions and concerns encountered lives. In this paper we evaluate an approach to extracting causalities from using natural language processing (NLP) techniques. Lexico-syntactic patterns based on dependency parser outputs are used for causality extraction. We focused three topics: “stress”, “insomnia”, “headache.” A large dataset...

10.1186/s12911-019-0785-0 article EN cc-by BMC Medical Informatics and Decision Making 2019-04-01

Integrating existing natural language processing tools for medication extraction from discharge summaries

OPENALEX - Publications

Son Doan Lisa Bastarache Sergio Klimkowski Joshua C. Denny Hua Xu

Objective: To develop an automated system to extract medications and related information from discharge summaries as part of the 2009 i2b2 natural language processing (NLP) challenge. This task required accurate recognition medication name, dosage, mode, frequency, duration, reason for drug administration. Design: We developed integrated using several existing NLP components at Vanderbilt University Medical Center, which included MedEx (to information), SecTag (a section identification...

10.1136/jamia.2010.003855 article EN Journal of the American Medical Informatics Association 2010-09-01

Building a Natural Language Processing Tool to Identify Patients With High Clinical Suspicion for Kawasaki Disease from Emergency Department Notes

OPENALEX - Publications

Son Doan Cleo K. Maehara Juan D. Chaparro Sisi Lu Ruiling Liu and 8 more

Delayed diagnosis of Kawasaki disease (KD) may lead to serious cardiac complications. We sought create and test the performance a natural language processing (NLP) tool, KD-NLP, in identification emergency department (ED) patients for whom KD should be considered.We developed an NLP tool that recognizes diagnostic criteria based on standard clinical terms medical word usage using 22 pediatric ED notes augmented by Unified Medical Language System vocabulary. With high suspicion defined as...

10.1111/acem.12925 article EN Academic Emergency Medicine 2016-01-30

Classifying disease outbreak reports using n-grams and semantic features

OPENALEX - Publications

Mike Conway Son Doan Ai Kawazoe Nigel Collier

10.1016/j.ijmedinf.2009.03.010 article EN International Journal of Medical Informatics 2009-05-16

Recognition of medication information from discharge summaries using ensembles of classifiers

OPENALEX - Publications

Son Doan Nigel Collier Hua Xu Pham Hoang Duy Từ Minh Phương

Extraction of clinical information such as medications or problems from text is an important task natural language processing (NLP). Rule-based methods are often used in NLP systems because they easy to adapt and customize. Recently, supervised machine learning have proven be effective well. However, combining different classifiers further improve the performance entity recognition has not been investigated extensively. Combining into ensemble classifier presents both challenges...

10.1186/1472-6947-12-36 article EN cc-by BMC Medical Informatics and Decision Making 2012-05-07

Enhancing Twitter Data Analysis with Simple Semantic Filtering: Example in Tracking Influenza-Like Illnesses

OPENALEX - Publications

Son Doan Lucila Ohno‐Machado Nigel Collier

Systems that exploit publicly available user generated content such as Twitter messages have been successful in tracking seasonal influenza. We developed a novel filtering method for Influenza-Like-Ilnesses (ILI)-related using 587 million from micro-blogs. first filtered based on syndrome keywords the BioCaster Ontology, an extant knowledge model of laymen's terms. then according to semantic features negation, hashtags, emoticons, humor and geography. The data covered 36 weeks US 2009...

10.1109/hisb.2012.21 preprint EN 2012-09-01

How Do You #relax When You’re #stressed? A Content Analysis and Infodemiology Study of Stress-Related Tweets

OPENALEX - Publications

Son Doan Amanda Ritchart Nicholas S. Perry Juan D. Chaparro Mike Conway

Background: Stress is a contributing factor to many major health problems in the United States, such as heart disease, depression, and autoimmune diseases. Relaxation often recommended mental treatment frontline strategy reduce stress, thereby improving conditions. Objective: The objective of our study was understand how people express their feelings stress relaxation through Twitter messages. Methods: We first performed qualitative content analysis 1326 781 tweets containing keywords...

10.2196/publichealth.5939 article EN cc-by JMIR Public Health and Surveillance 2017-06-13

Mining Health-Related Issues in Consumer Product Reviews by Using Scalable Text Analytics

OPENALEX - Publications

Manabu Torii Sameer Tilak Son Doan Daniel S. Zisook Jungwei Fan

In an era when most of our life activities are digitized and recorded, opportunities abound to gain insights about population health. Online product reviews present a unique data source that is currently underexplored. Health-related information, although scarce, can be systematically mined in online reviews. Leveraging natural language processing machine learning tools, we were able mine 1.3 million grocery for health-related information. The objectives the study as follows: (1) conduct...

10.4137/bii.s37791 article EN Biomedical Informatics Insights 2016-01-01

Towards role-based filtering of disease outbreak reports

OPENALEX - Publications

Son Doan Ai Kawazoe Mike Conway Nigel Collier

10.1016/j.jbi.2008.12.009 article EN publisher-specific-oa Journal of Biomedical Informatics 2009-01-02

Recent trends in biomedical informatics: a study based onJAMIAarticles

OPENALEX - Publications

Xiaoqian Jiang Krystal Tse Shuang Wang Son Doan Hyeoneui Kim and 1 more

In a growing interdisciplinary field like biomedical informatics, information dissemination and citation trends are changing rapidly due to many factors. To understand these factors better, we analyzed the evolution of number articles per major informatics topic, download/online view frequencies, patterns (using Web Science) for published from 2009 2012 in JAMIA. The JAMIA increased significantly 2012, there were some topic differences last 4 years. Medical Record Systems, Algorithms,...

10.1136/amiajnl-2013-002429 article EN cc-by-nc-nd Journal of the American Medical Informatics Association 2013-11-08

Task formulation for Extracting Social Determinants of Health from Clinical Narratives

OPENALEX - Publications

Manabu Torii Ian M. Finn Son Doan Paul P. Wang Elly W. Yang and 1 more

Objective: The 2022 n2c2 NLP Challenge posed identification of social determinants health (SDOH) in clinical narratives. We present three systems that we developed for the and discuss distinctive task formulation used each systems. Materials Methods: first system identifies target pieces information independently using machine learning classifiers. second uses a large language model (LLM) to extract complete structured outputs per document. third extracts candidate phrases relations with...

10.48550/arxiv.2301.11386 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Using hedges to enhance a disease outbreak report text mining system

OPENALEX - Publications

Mike Conway Nigel Collier Son Doan

Identifying serious infectious disease outbreaks in their early stages is an important task, both for national governments and international organizations like the World Health Organization. Text mining information extraction systems can provide important, low cost timely warning system these circumstances by identifying first signs of outbreak automatically from online textual news. One interesting characteristic reports --- which to best our knowledge has not been studied before use...

10.3115/1572364.1572384 article EN 2009-01-01

PhenDisco: phenotype discovery system for the database of genotypes and phenotypes

OPENALEX - Publications

Son Doan Ko‐Wei Lin Mike Conway Lucila Ohno‐Machado Alex Hsieh and 10 more

The database of genotypes and phenotypes (dbGaP) developed by the National Center for Biotechnology Information (NCBI) is a resource that contains information on various genome-wide association studies (GWAS) currently available via NCBI's dbGaP Entrez interface. an important resource, providing GWAS data can be used new exploratory research or cross-study validation authorized users. However, finding relevant to particular phenotype interest challenging, as presented in non-standardized...

10.1136/amiajnl-2013-001882 article EN cc-by-nc-nd Journal of the American Medical Informatics Association 2013-08-30

Coming Soon ...