- Biomedical Text Mining and Ontologies
- Topic Modeling
- Machine Learning in Healthcare
- Natural Language Processing Techniques
- Semantic Web and Ontologies
- Bioinformatics and Genomic Networks
- Genetics, Bioinformatics, and Biomedical Research
- Artificial Intelligence in Healthcare
- Genomics and Phylogenetic Studies
- Mental Health via Writing
- Pharmacovigilance and Adverse Drug Reactions
- Machine Learning in Bioinformatics
- Medical Coding and Health Information
- Genomics and Rare Diseases
- Artificial Intelligence in Healthcare and Education
- Mental Health Research Topics
- Spam and Phishing Detection
- Computational Drug Discovery Methods
- Misinformation and Its Impacts
- Text Readability and Simplification
- Sentiment Analysis and Opinion Mining
- AI in cancer detection
- Imbalanced Data Classification Techniques
- Text and Document Classification Technologies
- Advanced Text Analysis Techniques
National Kaohsiung University of Science and Technology
2019-2024
Kaohsiung Medical University
2019-2024
National Health Research Institutes
2020-2024
The University of Queensland
2023
Intelligent Systems Research (United States)
2020
National Taitung University
2015-2019
National University of Kaohsiung
2018
Institute of Information Science, Academia Sinica
2008-2016
Taipei Medical University
2013-2015
UNSW Sydney
2015
Abstract Nineteen teams presented results for the Gene Mention Task at BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding gene name mentions. A variety of different methods were used and varied with a highest achieved F 1 score 0.8721. Here we present brief descriptions all statistical analysis results. We also demonstrate that, by combining from submissions, an 0.9066 is feasible, furthermore that best result makes use...
We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers genes detected full-text articles. For training, 32 fully and 500 partially annotated articles prepared. A total 507 selected as test set. Due high annotation cost, it was not feasible obtain gold-standard human annotations for all Instead, we developed an Expectation Maximization (EM) algorithm approach choosing small number manual that most capable...
Fully automated text mining (TM) systems promote efficient literature searching, retrieval, and review but are not sufficient to produce ready-to-consume curated documents. These meant replace biocurators, instead assist them in one or more curation steps. To do so, the user interface is an important aspect that needs be considered for tool adoption. The BioCreative Interactive task (IAT) a track designed exploring user-system interactions, promoting development of useful TM tools, providing...
Coronary artery disease (CAD) often leads to myocardial infarction, which may be fatal. Risk factors can used predict CAD, subsequently lead prevention or early intervention. Patient data such as co-morbidities, medication history, social history and family are required determine the risk for a disease. However, factor usually embedded in unstructured clinical narratives if is not collected specifically assessment purposes. Clinical text mining extract related from notes. This study presents...
The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate annotation, disease association studies, electronic health records other types. biomedical literature provides rich foundation for populating these KBs, reporting genetic molecular interactions provide scaffold cellular...
Background Electronic health records (EHRs) in unstructured formats are valuable sources of information for research both the clinical and biomedical domains. However, before such can be used purposes, sensitive (SHI) must removed several cases to protect patient privacy. Rule-based machine learning–based methods have been shown effective deidentification. very few studies investigated combination transformer-based language models rules. Objective The objective this study is develop a hybrid...
The introduction of pre-trained language models in natural processing (NLP) based on deep learning and the availability electronic health records (EHRs) presents a great opportunity to transfer “knowledge” learned from data general domain enable analysis unstructured textual clinical domains. This study explored feasibility applying NLP small EHR dataset investigate power facilitate process patient screening psychiatry. A total 500 patients were randomly selected medical center database....
Hardness testing is an essential test in the metal manufacturing industry, and Vickers hardness one of most widely used measurements today. The computer-assisted requires manually generating indentations for measurement, but process tedious measured results may depend on operator’s experience. In light this, this paper proposes a data-driven approach based convolutional neural networks to measure value directly from image specimen get rid aforementioned limitations. Multi-task learning...
Electronic health record (EHR) is a digital data format that collects electronic information about an individual patient or population. To enhance the meaningful use of EHRs, extraction techniques have been developed to recognize clinical concepts mentioned in EHRs. Nevertheless, judgment EHR cannot be known solely based on recognized without considering its contextual information. In order improve readability and accessibility this work section heading recognition system for documents....
Electronic medical records (EMRs) for diabetic patients contain information about heart disease risk factors such as high blood pressure, cholesterol levels, and smoking status. Discovering the described tracking their progression over time may support personnel in making clinical decisions, well facilitate data modeling biomedical research. Such highly patient-specific knowledge is essential to driving advancement of evidence-based practice, can also help improve personalized medicine care....
An adverse drug event (ADE) refers to an injury resulting from medical intervention related a including harm caused by drugs or the usage of drugs. Extracting ADEs clinical records can help physicians associate events targeted drugs.We proposed cascading architecture recognize concepts ADEs, names, and entities The includes preprocessing method ensemble conditional random fields (CRFs) neural network-based models respectively address challenges surrogate string overlapping annotation...
Abstract Background Bioinformatics tools for automatic processing of biomedical literature are invaluable both the design and interpretation large-scale experiments. Many information extraction (IE) systems that incorporate natural language (NLP) techniques have thus been developed use in field. A key IE task this field is relations, such as protein-protein gene-disease interactions. However, most relation usually ignore adverbial prepositional phrases words identifying location, manner,...
Social media platforms are emerging digital communication channels that provide an easy way for common people to share their health and medication experiences online. With more discussing information online publicly, social present a rich source of exploring adverse drug reactions (ADRs). ADRs major public problems result in deaths hospitalizations millions people. Unfortunately, not all identified before is made available the market. In this study, ADR event monitoring system developed...
In Electronic Health Record (EHR) systems, key patient information is often captured in the form of unstructured clinical notes. The from these notes can be extracted using Clinical Natural Language Processing (NLP). Training corpus a factor development efficient NLP models. construction complex and multifaceted. There are several challenges construction, but one challenge not researched well cohort selection aspect. this study, we present methods employed encountered for corpus. specific...
Abstract Background DNA methylation is regarded as a potential biomarker in the diagnosis and treatment of cancer. The relations between aberrant gene cancer development have been identified by number recent scientific studies. In previous work, we used co-occurrences to mine those associations compiled MeInfoText 1.0 database. To reduce amount manual curation improve accuracy relation extraction, now developed 2.0, which uses machine learning-based approach extract methylation-cancer...
Background International Classification of Diseases codes are widely used to describe diagnosis information, but manual coding relies heavily on human interpretation, which can be expensive, time consuming, and prone errors. With the transition from Diseases, Ninth Revision, Tenth Revision (ICD-10), process has become more complex, highlighting need for automated approaches enhance efficiency accuracy. Inaccurate result in substantial financial losses hospitals, a precise assessment outcomes...
Experimentally verified protein-protein interactions (PPI) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can made faster ranking newly-published articles' relevance to PPI, a task which we approach here designing machine-learning-based classifier. All classifiers require labeled data, and the more data available, reliable become. Although many with large numbers articles incorporating these into base training may actually...
The widespread use of electronic health records in the clinical and biomedical fields makes removal protected information (PHI) essential to maintain privacy. However, a significant portion is recorded unstructured textual forms, posing challenge for deidentification. In multilingual countries, medical could be written mixture more than one language, referred as code mixing. Most current natural language processing techniques are designed monolingual text, there need address deidentification...