NFDI4DS | UHH-SEMS - Publication Details

Hong-Jie Dai

ORCID: 0000-0002-1516-7255

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5037362113

Research Areas

Biomedical Text Mining and Ontologies
Topic Modeling
Machine Learning in Healthcare
Natural Language Processing Techniques
Semantic Web and Ontologies
Bioinformatics and Genomic Networks
Genetics, Bioinformatics, and Biomedical Research
Artificial Intelligence in Healthcare
Genomics and Phylogenetic Studies
Mental Health via Writing
Pharmacovigilance and Adverse Drug Reactions
Machine Learning in Bioinformatics
Medical Coding and Health Information
Genomics and Rare Diseases
Artificial Intelligence in Healthcare and Education
Mental Health Research Topics
Spam and Phishing Detection
Computational Drug Discovery Methods
Misinformation and Its Impacts
Text Readability and Simplification
Sentiment Analysis and Opinion Mining
AI in cancer detection
Imbalanced Data Classification Techniques
Text and Document Classification Technologies
Advanced Text Analysis Techniques

National Kaohsiung University of Science and Technology
2019-2024

Kaohsiung Medical University
2019-2024

National Health Research Institutes
2020-2024

The University of Queensland
2023

Intelligent Systems Research (United States)
2020

National Taitung University
2015-2019

National University of Kaohsiung
2018

Institute of Information Science, Academia Sinica
2008-2016

Taipei Medical University
2013-2015

UNSW Sydney
2015

Overview of BioCreative II gene mention recognition

OPENALEX - Publications

Larry Smith Lorraine Tanabe R. Ando Cheng-Ju Kuo I‐Fang Chung and 29 more

Abstract Nineteen teams presented results for the Gene Mention Task at BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding gene name mentions. A variety of different methods were used and varied with a highest achieved F 1 score 0.8721. Here we present brief descriptions all statistical analysis results. We also demonstrate that, by combining from submissions, an 0.9066 is feasible, furthermore that best result makes use...

10.1186/gb-2008-9-s2-s2 article EN cc-by Genome biology 2008-09-01

The gene normalization task in BioCreative III

OPENALEX - Publications

Zhiyong Lu Hung‐Yu Kao Chih-Hsuan Wei Minlie Huang Jingchen Liu and 23 more

We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers genes detected full-text articles. For training, 32 fully and 500 partially annotated articles prepared. A total 507 selected as test set. Due high annotation cost, it was not feasible obtain gold-standard human annotations for all Instead, we developed an Expectation Maximization (EM) algorithm approach choosing small number manual that most capable...

10.1186/1471-2105-12-s8-s2 article EN cc-by BMC Bioinformatics 2011-10-03

Using text mining to extract depressive symptoms and to validate the diagnosis of major depressive disorder from electronic health records

OPENALEX - Publications

Chi‐Shin Wu Chian‐Jue Kuo Chu-Hsien Su Shi‐Heng Wang Hong-Jie Dai

10.1016/j.jad.2019.09.044 article EN Journal of Affective Disorders 2019-09-11

Overview of the interactive task in BioCreative V

OPENALEX - Publications

Wang Qing-hua Shabbir Syed-Abdul Lara Almeida Sophia Ananiadou Yalbi Itzel Balderas-Martínez and 51 more

Fully automated text mining (TM) systems promote efficient literature searching, retrieval, and review but are not sufficient to produce ready-to-consume curated documents. These meant replace biocurators, instead assist them in one or more curation steps. To do so, the user interface is an important aspect that needs be considered for tool adoption. The BioCreative Interactive task (IAT) a track designed exploring user-system interactions, promoting development of useful TM tools, providing...

10.1093/database/baw119 article EN cc-by Database 2016-01-01

Coronary artery disease risk assessment from unstructured electronic health records using text mining

OPENALEX - Publications

Jitendra Jonnagaddala Siaw‐Teng Liaw Pradeep Ray Manish Kumar Nai‐Wen Chang and 1 more

Coronary artery disease (CAD) often leads to myocardial infarction, which may be fatal. Risk factors can used predict CAD, subsequently lead prevention or early intervention. Patient data such as co-morbidities, medication history, social history and family are required determine the risk for a disease. However, factor usually embedded in unstructured clinical narratives if is not collected specifically assessment purposes. Clinical text mining extract related from notes. This study presents...

10.1016/j.jbi.2015.08.003 article EN cc-by-nc-nd Journal of Biomedical Informatics 2015-08-28

Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine

OPENALEX - Publications

Rezarta Islamaj Sun Kim Andrew Chatr‐aryamontri Chih-Hsuan Wei Donald C. Comeau and 22 more

The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate annotation, disease association studies, electronic health records other types. biomedical literature provides rich foundation for populating these KBs, reporting genetic molecular interactions provide scaffold cellular...

10.1093/database/bay147 article EN cc-by Database 2018-12-21

OpenDeID Pipeline for Unstructured Electronic Health Record Text Notes Based on Rules and Transformers: Deidentification Algorithm Development and Validation Study

OPENALEX - Publications

Jiaxing Liu Shalini Gupta Aipeng Chen Chen-Kai Wang Pratik Mishra and 3 more

Background Electronic health records (EHRs) in unstructured formats are valuable sources of information for research both the clinical and biomedical domains. However, before such can be used purposes, sensitive (SHI) must removed several cases to protect patient privacy. Rule-based machine learning–based methods have been shown effective deidentification. very few studies investigated combination transformer-based language models rules. Objective The objective this study is develop a hybrid...

10.2196/48145 article EN cc-by Journal of Medical Internet Research 2023-12-06

Deep Learning-Based Natural Language Processing for Screening Psychiatric Patients

OPENALEX - Publications

Hong-Jie Dai Chu-Hsien Su You-Qian Lee You-Chen Zhang Chen-Kai Wang and 2 more

The introduction of pre-trained language models in natural processing (NLP) based on deep learning and the availability electronic health records (EHRs) presents a great opportunity to transfer “knowledge” learned from data general domain enable analysis unstructured textual clinical domains. This study explored feasibility applying NLP small EHR dataset investigate power facilitate process patient screening psychiatry. A total 500 patients were randomly selected medical center database....

10.3389/fpsyt.2020.533949 article EN cc-by Frontiers in Psychiatry 2021-01-15

Vickers Hardness Value Test via Multi-Task Learning Convolutional Neural Networks and Image Augmentation

OPENALEX - Publications

Wan-Shu Cheng Guanying Chen Xin-Yen Shih Mahmoud Elsisi Meng-Hsiu Tsai and 1 more

Hardness testing is an essential test in the metal manufacturing industry, and Vickers hardness one of most widely used measurements today. The computer-assisted requires manually generating indentations for measurement, but process tedious measured results may depend on operator’s experience. In light this, this paper proposes a data-driven approach based convolutional neural networks to measure value directly from image specimen get rid aforementioned limitations. Multi-task learning...

10.3390/app122110820 article EN cc-by Applied Sciences 2022-10-25

Recognition and Evaluation of Clinical Section Headings in Clinical Documents Using Token-Based Formulation with Conditional Random Fields

OPENALEX - Publications

Hong-Jie Dai Shabbir Syed-Abdul Chih‐Wei Chen Chieh-Chen Wu

Electronic health record (EHR) is a digital data format that collects electronic information about an individual patient or population. To enhance the meaningful use of EHRs, extraction techniques have been developed to recognize clinical concepts mentioned in EHRs. Nevertheless, judgment EHR cannot be known solely based on recognized without considering its contextual information. In order improve readability and accessibility this work section heading recognition system for documents....

10.1155/2015/873012 article EN cc-by BioMed Research International 2015-01-01

A context-aware approach for progression tracking of medical concepts in electronic medical records

OPENALEX - Publications

Nai‐Wen Chang Hong-Jie Dai Jitendra Jonnagaddala Chih‐Wei Chen Richard Tzong‐Han Tsai and 1 more

Electronic medical records (EMRs) for diabetic patients contain information about heart disease risk factors such as high blood pressure, cholesterol levels, and smoking status. Discovering the described tracking their progression over time may support personnel in making clinical decisions, well facilitate data modeling biomedical research. Such highly patient-specific knowledge is essential to driving advancement of evidence-based practice, can also help improve personalized medicine care....

10.1016/j.jbi.2015.09.013 article EN cc-by-nc-nd Journal of Biomedical Informatics 2015-10-03

Adverse drug event and medication extraction in electronic health records via a cascading architecture with different sequence labeling models and word embeddings

OPENALEX - Publications

Hong-Jie Dai Chu-Hsien Su Chi‐Shin Wu

An adverse drug event (ADE) refers to an injury resulting from medical intervention related a including harm caused by drugs or the usage of drugs. Extracting ADEs clinical records can help physicians associate events targeted drugs.We proposed cascading architecture recognize concepts ADEs, names, and entities The includes preprocessing method ensemble conditional random fields (CRFs) neural network-based models respectively address challenges surrogate string overlapping annotation...

10.1093/jamia/ocz120 article EN Journal of the American Medical Informatics Association 2019-06-14

BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features

OPENALEX - Publications

Richard Tzong‐Han Tsai Wen‐Chi Chou Ying-Shan Su Yu‐Chun Lin Cheng-Lung Sung and 5 more

Abstract Background Bioinformatics tools for automatic processing of biomedical literature are invaluable both the design and interpretation large-scale experiments. Many information extraction (IE) systems that incorporate natural language (NLP) techniques have thus been developed use in field. A key IE task this field is relations, such as protein-protein gene-disease interactions. However, most relation usually ignore adverbial prepositional phrases words identifying location, manner,...

10.1186/1471-2105-8-325 article EN cc-by BMC Bioinformatics 2007-09-01

TEMPTING system: A hybrid method of rule and machine learning for temporal relation extraction in patient discharge summaries

OPENALEX - Publications

Yung‐Chun Chang Hong-Jie Dai Johnny Chi-Yang Wu Jianming Chen Richard Tzong‐Han Tsai and 1 more

10.1016/j.jbi.2013.09.007 article EN publisher-specific-oa Journal of Biomedical Informatics 2013-09-20

Feature Engineering for Recognizing Adverse Drug Reactions from Twitter Posts

OPENALEX - Publications

Hong-Jie Dai Musa Touray Jitendra Jonnagaddala Shabbir Syed-Abdul

Social media platforms are emerging digital communication channels that provide an easy way for common people to share their health and medication experiences online. With more discussing information online publicly, social present a rich source of exploring adverse drug reactions (ADRs). ADRs major public problems result in deaths hospitalizations millions people. Unfortunately, not all identified before is made available the market. In this study, ADR event monitoring system developed...

10.3390/info7020027 article EN cc-by Information 2016-05-25

Classifying adverse drug reactions from imbalanced twitter data

OPENALEX - Publications

Hong-Jie Dai Chen-Kai Wang

10.1016/j.ijmedinf.2019.05.017 article EN International Journal of Medical Informatics 2019-05-30

Cohort selection for construction of a clinical natural language processing corpus

OPENALEX - Publications

Naga Lalitha Valli Alla Aipeng Chen Sean Batongbacal Chandini Nekkantti Hong-Jie Dai and 1 more

In Electronic Health Record (EHR) systems, key patient information is often captured in the form of unstructured clinical notes. The from these notes can be extracted using Clinical Natural Language Processing (NLP). Training corpus a factor development efficient NLP models. construction complex and multifaceted. There are several challenges construction, but one challenge not researched well cohort selection aspect. this study, we present methods employed encountered for corpus. specific...

10.1016/j.cmpbup.2021.100024 article EN cc-by-nc-nd Computer Methods and Programs in Biomedicine Update 2021-01-01

MeInfoText 2.0: gene methylation and cancer relation extraction from biomedical literature

OPENALEX - Publications

Yu‐Ching Fang Po‐Ting Lai Hong-Jie Dai Wen−Lian Hsu

Abstract Background DNA methylation is regarded as a potential biomarker in the diagnosis and treatment of cancer. The relations between aberrant gene cancer development have been identified by number recent scientific studies. In previous work, we used co-occurrences to mine those associations compiled MeInfoText 1.0 database. To reduce amount manual curation improve accuracy relation extraction, now developed 2.0, which uses machine learning-based approach extract methylation-cancer...

10.1186/1471-2105-12-471 article EN cc-by BMC Bioinformatics 2011-12-01

Evaluating a Natural Language Processing–Driven, AI-Assisted International Classification of Diseases, 10th Revision, Clinical Modification, Coding System for Diagnosis Related Groups in a Real Hospital Environment: Algorithm Development and Validation Study

OPENALEX - Publications

Hong-Jie Dai Chen-Kai Wang Chien-Chang Chen Chong-Sin Liou An-Tai Lu and 9 more

Background International Classification of Diseases codes are widely used to describe diagnosis information, but manual coding relies heavily on human interpretation, which can be expensive, time consuming, and prone errors. With the transition from Diseases, Ninth Revision, Tenth Revision (ICD-10), process has become more complex, highlighting need for automated approaches enhance efficiency accuracy. Inaccurate result in substantial financial losses hospitals, a precise assessment outcomes...

10.2196/58278 article EN cc-by Journal of Medical Internet Research 2024-09-20

Exploiting likely-positive and unlabeled data to improve the identification of protein-protein interaction articles

OPENALEX - Publications

Richard Tzong‐Han Tsai Hsi-Chuan Hung Hong-Jie Dai Yi-Wen Lin Wen−Lian Hsu

Experimentally verified protein-protein interactions (PPI) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can made faster ranking newly-published articles' relevance to PPI, a task which we approach here designing machine-learning-based classifier. All classifiers require labeled data, and the more data available, reliable become. Although many with large numbers articles incorporating these into base training may actually...

10.1186/1471-2105-9-s1-s3 article EN cc-by BMC Bioinformatics 2008-02-01

Unlocking the Secrets Behind Advanced Artificial Intelligence Language Models in Deidentifying Chinese-English Mixed Clinical Text: Development and Validation Study

OPENALEX - Publications

You-Qian Lee Ching-Tai Chen Chien-Chang Chen Chung-Hong Lee Pei-Tsz Chen and 2 more

The widespread use of electronic health records in the clinical and biomedical fields makes removal protected information (PHI) essential to maintain privacy. However, a significant portion is recorded unstructured textual forms, posing challenge for deidentification. In multilingual countries, medical could be written mixture more than one language, referred as code mixing. Most current natural language processing techniques are designed monolingual text, there need address deidentification...

10.2196/48443 article EN cc-by Journal of Medical Internet Research 2023-12-05

Coming Soon ...