Hong Yu

ORCID: 0000-0001-9263-5035
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Biomedical Text Mining and Ontologies
  • Natural Language Processing Techniques
  • Machine Learning in Healthcare
  • Advanced Text Analysis Techniques
  • Semantic Web and Ontologies
  • Multimodal Machine Learning Applications
  • Health Literacy and Information Accessibility
  • Electronic Health Records Systems
  • Health Sciences Research and Education
  • Text Readability and Simplification
  • Pharmacovigilance and Adverse Drug Reactions
  • Domain Adaptation and Few-Shot Learning
  • Genetics, Bioinformatics, and Biomedical Research
  • Artificial Intelligence in Healthcare
  • Sentiment Analysis and Opinion Mining
  • Data Quality and Management
  • Intelligent Tutoring Systems and Adaptive Learning
  • Mobile Health and mHealth Applications
  • Chronic Disease Management Strategies
  • Social Media in Health Education
  • Imbalanced Data Classification Techniques
  • Text and Document Classification Technologies
  • Genomics and Phylogenetic Studies
  • Computational Drug Discovery Methods

University of Massachusetts Lowell
2017-2025

Amherst College
2017-2025

University of Massachusetts Amherst
2016-2025

Shanghai Electric (China)
2024-2025

VA New England Healthcare System
2021-2025

University of Massachusetts Chan Medical School
2015-2024

Chongqing University of Posts and Telecommunications
2024

United States Department of Veterans Affairs
2022-2024

UMass Memorial Medical Center
2023-2024

Edith Nourse Rogers Memorial Veterans Hospital
2016-2023

Opinion question answering is a challenging task for natural language processing. In this paper, we discuss necessary component an opinion system: separating opinions from fact, at both the document and sentence level. We present Bayesian classifier discriminating between documents with preponderance of such as editorials regular news stories, describe three unsupervised, statistical techniques significantly harder detecting also first model classifying sentences positive or negative in...

10.3115/1119355.1119372 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2003-01-01

Sequence labeling for extraction of medical events and their attributes from unstructured text in Electronic Health Record (EHR) notes is a key step towards semantic understanding EHRs. It has important applications health informatics including pharmacovigilance drug surveillance. The state the art supervised machine learning models this domain are based on Conditional Random Fields (CRFs) with features calculated fixed context windows. In application, we explored recurrent neural network...

10.18653/v1/n16-1056 article EN Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2016-01-01

Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence involves medical entities such as medication, indication, side-effects Electronic Health Record narratives. in this domain, presents its own set challenges objectives. work we experimented with various CRF based structured learning models Recurrent Neural Networks. We extend the previously studied...

10.18653/v1/d16-1082 preprint EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2016-01-01

Background The bidirectional encoder representations from transformers (BERT) model has achieved great success in many natural language processing (NLP) tasks, such as named entity recognition and question answering. However, little prior work explored this to be used for an important task the biomedical clinical domains, namely normalization. Objective We aim investigate effectiveness of BERT-based models or In addition, our second objective is whether domains training data influence...

10.2196/14830 article EN cc-by JMIR Medical Informatics 2019-09-12

Automated ICD coding, which assigns the International Classification of Disease codes to patient visits, has attracted much research attention since it can save time and labor for billing. The previous state-of-the-art model utilized one convolutional layer build document representations predicting codes. However, lengths grammar text fragments, are closely related vary a lot in different documents. Therefore, flat fixed-length architecture may not be capable learning good representations....

10.1609/aaai.v34i05.6331 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Advances in large language models (LLMs) have empowered a variety of applications. However, there is still significant gap research when it comes to understanding and enhancing the capabilities LLMs field mental health. In this work, we present comprehensive evaluation multiple on various health prediction tasks via online text data, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, GPT-4. We conduct broad range experiments, covering zero-shot prompting, few-shot instruction fine-tuning. The...

10.1145/3643540 article EN Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies 2024-03-06

Recent advancements in artificial intelligence, such as GPT-3.5 Turbo (OpenAI) and GPT-4, have demonstrated significant potential by achieving good scores on text-only United States Medical Licensing Examination (USMLE) exams effectively answering questions from physicians. However, the ability of these models to interpret medical images remains underexplored. This study aimed comprehensively evaluate performance, interpretability, limitations Turbo, its successor, GPT-4 Vision (GPT-4V),...

10.2196/65146 article EN cc-by Journal of Medical Internet Research 2025-02-07

The effects of phosphorothioate (S-oligonucleotide) or terminal phosphorothioate-phosphodiester (S-O-oligonucleotides) methylphosphonate-phosphodiester (MP-O-oligonucleotides) modifications on mouse spleen cell surface binding, uptake, and degradation were studied using fluorescein (FITC)-conjugated oligonucleotides. S-oligonucleotides had the highest binding followed by S-O-, O-, MP-O-oligonucleotides. Competition studies indicated that have an increased affinity for membrane...

10.1089/ard.1993.3.53 article EN Antisense Research and Development 1993-01-01

Abstract Deep learning transformer-based models using longitudinal electronic health records (EHRs) have shown a great success in prediction of clinical diseases or outcomes. Pretraining on large dataset can help such map the input space better and boost their performance relevant tasks through finetuning with limited data. In this study, we present TransformEHR, generative encoder-decoder model transformer that is pretrained new pretraining objective—predicting all outcomes patient at...

10.1038/s41467-023-43715-z article EN cc-by Nature Communications 2023-11-29

Importance Social determinants of health (SDOHs) are known to be associated with increased risk suicidal behaviors, but few studies use SDOHs from unstructured electronic record notes. Objective To investigate associations between veterans’ death by suicide and recent SDOHs, identified using structured data. Design, Setting, Participants This nested case-control study included veterans who received care under the US Veterans Health Administration October 1, 2010, September 30, 2015. A...

10.1001/jamanetworkopen.2023.3079 article EN cc-by-nc-nd JAMA Network Open 2023-03-15

Advances in large language models (LLMs) have empowered a variety of applications. However, there is still significant gap research when it comes to understanding and enhancing the capabilities LLMs field mental health. In this work, we present first comprehensive evaluation multiple LLMs, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, GPT-4, on various health prediction tasks via online text data. We conduct broad range experiments, covering zero-shot prompting, few-shot instruction...

10.48550/arxiv.2307.14385 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Objective: To develop methods that automatically map abbreviations to their full forms in biomedical articles. Methods: The authors developed two of mapping defined and undefined (defined are paired with the articles, whereas ones not). For abbreviations, they a set pattern-matching rules an abbreviation its form implemented into software program, AbbRE (for "abbreviation recognition extraction"). Using opinions domain experts as reference standard, evaluated recall precision for ten...

10.1197/jamia.m0913 article EN Journal of the American Medical Informatics Association 2002-05-01

Abstract Biomedical texts can be typically represented by four rhetorical categories: Introduction, Methods, Results and Discussion (IMRAD). Classifying sentences into these categories benefit many other text-mining tasks. Although studies have applied different approaches for automatically classifying in MEDLINE abstracts the IMRAD categories, few explored classification of that appear full-text biomedical articles. We first evaluated whether articles could reliably annotated format then...

10.1093/bioinformatics/btp548 article EN Bioinformatics 2009-09-25

Objective Negation is a linguistic phenomenon that marks the absence of an entity or event.Negated events are frequently reported in both biological literature and clinical notes.Text mining applications benefit from detection negation its scope.However, due to complexity language, identifying scope sentence not trivial task.Design Conditional random fields (CRF), supervised machine-learning algorithm, were used train models detect cue phrases their notes.The trained on publicly available...

10.1136/jamia.2010.003228 article EN Journal of the American Medical Informatics Association 2010-10-20

Identification of discourse relations, such as causal and contrastive between situations mentioned in text is an important task for biomedical text-mining. A corpus annotated with relations would be very useful developing evaluating methods processing. However, little effort has been made to develop resource.We have developed the Biomedical Discourse Relation Bank (BioDRB), which we explicit implicit 24 open-access full-text articles from GENIA corpus. Guidelines annotation were adapted Penn...

10.1186/1471-2105-12-188 article EN cc-by BMC Bioinformatics 2011-05-23

Medication and adverse drug event (ADE) information extracted from electronic health record (EHR) notes can be a rich resource for safety surveillance. Existing observational studies have mainly relied on structured EHR data to obtain ADE information; however, ADEs are often buried in the narratives not recorded data.To unlock ADE-related narratives, there is need extract relevant entities identify relations among them. In this study, we focus relation identification. This study aimed...

10.2196/publichealth.9361 article EN cc-by JMIR Public Health and Surveillance 2018-04-25

Pharmacovigilance and drug-safety surveillance are crucial for monitoring adverse drug events (ADEs), but the main ADE-reporting systems such as Food Drug Administration Adverse Event Reporting System face challenges underreporting. Therefore, complementary surveillance, data on ADEs extracted from electronic health record (EHR) notes via natural language processing (NLP). As NLP develops, many up-to-date machine-learning techniques introduced in this field, deep learning multi-task (MTL)....

10.2196/12159 article EN cc-by JMIR Medical Informatics 2018-11-09
Coming Soon ...