Hong-Jie Dai

ORCID: 0000-0002-1516-7255
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Biomedical Text Mining and Ontologies
  • Topic Modeling
  • Machine Learning in Healthcare
  • Natural Language Processing Techniques
  • Semantic Web and Ontologies
  • Bioinformatics and Genomic Networks
  • Genetics, Bioinformatics, and Biomedical Research
  • Artificial Intelligence in Healthcare
  • Genomics and Phylogenetic Studies
  • Mental Health via Writing
  • Pharmacovigilance and Adverse Drug Reactions
  • Machine Learning in Bioinformatics
  • Medical Coding and Health Information
  • Genomics and Rare Diseases
  • Artificial Intelligence in Healthcare and Education
  • Mental Health Research Topics
  • Spam and Phishing Detection
  • Computational Drug Discovery Methods
  • Misinformation and Its Impacts
  • Text Readability and Simplification
  • Sentiment Analysis and Opinion Mining
  • AI in cancer detection
  • Imbalanced Data Classification Techniques
  • Text and Document Classification Technologies
  • Advanced Text Analysis Techniques

National Kaohsiung University of Science and Technology
2019-2024

Kaohsiung Medical University
2019-2024

National Health Research Institutes
2020-2024

The University of Queensland
2023

Intelligent Systems Research (United States)
2020

National Taitung University
2015-2019

National University of Kaohsiung
2018

Institute of Information Science, Academia Sinica
2008-2016

Taipei Medical University
2013-2015

UNSW Sydney
2015

Abstract Nineteen teams presented results for the Gene Mention Task at BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding gene name mentions. A variety of different methods were used and varied with a highest achieved F 1 score 0.8721. Here we present brief descriptions all statistical analysis results. We also demonstrate that, by combining from submissions, an 0.9066 is feasible, furthermore that best result makes use...

10.1186/gb-2008-9-s2-s2 article EN cc-by Genome biology 2008-09-01

We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers genes detected full-text articles. For training, 32 fully and 500 partially annotated articles prepared. A total 507 selected as test set. Due high annotation cost, it was not feasible obtain gold-standard human annotations for all Instead, we developed an Expectation Maximization (EM) algorithm approach choosing small number manual that most capable...

10.1186/1471-2105-12-s8-s2 article EN cc-by BMC Bioinformatics 2011-10-03

Fully automated text mining (TM) systems promote efficient literature searching, retrieval, and review but are not sufficient to produce ready-to-consume curated documents. These meant replace biocurators, instead assist them in one or more curation steps. To do so, the user interface is an important aspect that needs be considered for tool adoption. The BioCreative Interactive task (IAT) a track designed exploring user-system interactions, promoting development of useful TM tools, providing...

10.1093/database/baw119 article EN cc-by Database 2016-01-01

Coronary artery disease (CAD) often leads to myocardial infarction, which may be fatal. Risk factors can used predict CAD, subsequently lead prevention or early intervention. Patient data such as co-morbidities, medication history, social history and family are required determine the risk for a disease. However, factor usually embedded in unstructured clinical narratives if is not collected specifically assessment purposes. Clinical text mining extract related from notes. This study presents...

10.1016/j.jbi.2015.08.003 article EN cc-by-nc-nd Journal of Biomedical Informatics 2015-08-28

The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate annotation, disease association studies, electronic health records other types. biomedical literature provides rich foundation for populating these KBs, reporting genetic molecular interactions provide scaffold cellular...

10.1093/database/bay147 article EN cc-by Database 2018-12-21

Background Electronic health records (EHRs) in unstructured formats are valuable sources of information for research both the clinical and biomedical domains. However, before such can be used purposes, sensitive (SHI) must removed several cases to protect patient privacy. Rule-based machine learning–based methods have been shown effective deidentification. very few studies investigated combination transformer-based language models rules. Objective The objective this study is develop a hybrid...

10.2196/48145 article EN cc-by Journal of Medical Internet Research 2023-12-06

The introduction of pre-trained language models in natural processing (NLP) based on deep learning and the availability electronic health records (EHRs) presents a great opportunity to transfer “knowledge” learned from data general domain enable analysis unstructured textual clinical domains. This study explored feasibility applying NLP small EHR dataset investigate power facilitate process patient screening psychiatry. A total 500 patients were randomly selected medical center database....

10.3389/fpsyt.2020.533949 article EN cc-by Frontiers in Psychiatry 2021-01-15

Hardness testing is an essential test in the metal manufacturing industry, and Vickers hardness one of most widely used measurements today. The computer-assisted requires manually generating indentations for measurement, but process tedious measured results may depend on operator’s experience. In light this, this paper proposes a data-driven approach based convolutional neural networks to measure value directly from image specimen get rid aforementioned limitations. Multi-task learning...

10.3390/app122110820 article EN cc-by Applied Sciences 2022-10-25

Electronic health record (EHR) is a digital data format that collects electronic information about an individual patient or population. To enhance the meaningful use of EHRs, extraction techniques have been developed to recognize clinical concepts mentioned in EHRs. Nevertheless, judgment EHR cannot be known solely based on recognized without considering its contextual information. In order improve readability and accessibility this work section heading recognition system for documents....

10.1155/2015/873012 article EN cc-by BioMed Research International 2015-01-01

Electronic medical records (EMRs) for diabetic patients contain information about heart disease risk factors such as high blood pressure, cholesterol levels, and smoking status. Discovering the described tracking their progression over time may support personnel in making clinical decisions, well facilitate data modeling biomedical research. Such highly patient-specific knowledge is essential to driving advancement of evidence-based practice, can also help improve personalized medicine care....

10.1016/j.jbi.2015.09.013 article EN cc-by-nc-nd Journal of Biomedical Informatics 2015-10-03

An adverse drug event (ADE) refers to an injury resulting from medical intervention related a including harm caused by drugs or the usage of drugs. Extracting ADEs clinical records can help physicians associate events targeted drugs.We proposed cascading architecture recognize concepts ADEs, names, and entities The includes preprocessing method ensemble conditional random fields (CRFs) neural network-based models respectively address challenges surrogate string overlapping annotation...

10.1093/jamia/ocz120 article EN Journal of the American Medical Informatics Association 2019-06-14

Abstract Background Bioinformatics tools for automatic processing of biomedical literature are invaluable both the design and interpretation large-scale experiments. Many information extraction (IE) systems that incorporate natural language (NLP) techniques have thus been developed use in field. A key IE task this field is relations, such as protein-protein gene-disease interactions. However, most relation usually ignore adverbial prepositional phrases words identifying location, manner,...

10.1186/1471-2105-8-325 article EN cc-by BMC Bioinformatics 2007-09-01

Social media platforms are emerging digital communication channels that provide an easy way for common people to share their health and medication experiences online. With more discussing information online publicly, social present a rich source of exploring adverse drug reactions (ADRs). ADRs major public problems result in deaths hospitalizations millions people. Unfortunately, not all identified before is made available the market. In this study, ADR event monitoring system developed...

10.3390/info7020027 article EN cc-by Information 2016-05-25

10.1016/j.ijmedinf.2019.05.017 article EN International Journal of Medical Informatics 2019-05-30

In Electronic Health Record (EHR) systems, key patient information is often captured in the form of unstructured clinical notes. The from these notes can be extracted using Clinical Natural Language Processing (NLP). Training corpus a factor development efficient NLP models. construction complex and multifaceted. There are several challenges construction, but one challenge not researched well cohort selection aspect. this study, we present methods employed encountered for corpus. specific...

10.1016/j.cmpbup.2021.100024 article EN cc-by-nc-nd Computer Methods and Programs in Biomedicine Update 2021-01-01

Abstract Background DNA methylation is regarded as a potential biomarker in the diagnosis and treatment of cancer. The relations between aberrant gene cancer development have been identified by number recent scientific studies. In previous work, we used co-occurrences to mine those associations compiled MeInfoText 1.0 database. To reduce amount manual curation improve accuracy relation extraction, now developed 2.0, which uses machine learning-based approach extract methylation-cancer...

10.1186/1471-2105-12-471 article EN cc-by BMC Bioinformatics 2011-12-01

Background International Classification of Diseases codes are widely used to describe diagnosis information, but manual coding relies heavily on human interpretation, which can be expensive, time consuming, and prone errors. With the transition from Diseases, Ninth Revision, Tenth Revision (ICD-10), process has become more complex, highlighting need for automated approaches enhance efficiency accuracy. Inaccurate result in substantial financial losses hospitals, a precise assessment outcomes...

10.2196/58278 article EN cc-by Journal of Medical Internet Research 2024-09-20

Experimentally verified protein-protein interactions (PPI) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can made faster ranking newly-published articles' relevance to PPI, a task which we approach here designing machine-learning-based classifier. All classifiers require labeled data, and the more data available, reliable become. Although many with large numbers articles incorporating these into base training may actually...

10.1186/1471-2105-9-s1-s3 article EN cc-by BMC Bioinformatics 2008-02-01

The widespread use of electronic health records in the clinical and biomedical fields makes removal protected information (PHI) essential to maintain privacy. However, a significant portion is recorded unstructured textual forms, posing challenge for deidentification. In multilingual countries, medical could be written mixture more than one language, referred as code mixing. Most current natural language processing techniques are designed monolingual text, there need address deidentification...

10.2196/48443 article EN cc-by Journal of Medical Internet Research 2023-12-05
Coming Soon ...