- Topic Modeling
- Biomedical Text Mining and Ontologies
- Natural Language Processing Techniques
- Machine Learning in Healthcare
- Advanced Text Analysis Techniques
- Semantic Web and Ontologies
- Multimodal Machine Learning Applications
- Health Literacy and Information Accessibility
- Electronic Health Records Systems
- Health Sciences Research and Education
- Text Readability and Simplification
- Pharmacovigilance and Adverse Drug Reactions
- Domain Adaptation and Few-Shot Learning
- Genetics, Bioinformatics, and Biomedical Research
- Artificial Intelligence in Healthcare
- Sentiment Analysis and Opinion Mining
- Data Quality and Management
- Intelligent Tutoring Systems and Adaptive Learning
- Mobile Health and mHealth Applications
- Chronic Disease Management Strategies
- Social Media in Health Education
- Imbalanced Data Classification Techniques
- Text and Document Classification Technologies
- Genomics and Phylogenetic Studies
- Computational Drug Discovery Methods
University of Massachusetts Lowell
2017-2025
Amherst College
2017-2025
University of Massachusetts Amherst
2016-2025
Shanghai Electric (China)
2024-2025
VA New England Healthcare System
2021-2025
University of Massachusetts Chan Medical School
2015-2024
Chongqing University of Posts and Telecommunications
2024
United States Department of Veterans Affairs
2022-2024
UMass Memorial Medical Center
2023-2024
Edith Nourse Rogers Memorial Veterans Hospital
2016-2023
Opinion question answering is a challenging task for natural language processing. In this paper, we discuss necessary component an opinion system: separating opinions from fact, at both the document and sentence level. We present Bayesian classifier discriminating between documents with preponderance of such as editorials regular news stories, describe three unsupervised, statistical techniques significantly harder detecting also first model classifying sentences positive or negative in...
Sequence labeling for extraction of medical events and their attributes from unstructured text in Electronic Health Record (EHR) notes is a key step towards semantic understanding EHRs. It has important applications health informatics including pharmacovigilance drug surveillance. The state the art supervised machine learning models this domain are based on Conditional Random Fields (CRFs) with features calculated fixed context windows. In application, we explored recurrent neural network...
Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence involves medical entities such as medication, indication, side-effects Electronic Health Record narratives. in this domain, presents its own set challenges objectives. work we experimented with various CRF based structured learning models Recurrent Neural Networks. We extend the previously studied...
Background The bidirectional encoder representations from transformers (BERT) model has achieved great success in many natural language processing (NLP) tasks, such as named entity recognition and question answering. However, little prior work explored this to be used for an important task the biomedical clinical domains, namely normalization. Objective We aim investigate effectiveness of BERT-based models or In addition, our second objective is whether domains training data influence...
Automated ICD coding, which assigns the International Classification of Disease codes to patient visits, has attracted much research attention since it can save time and labor for billing. The previous state-of-the-art model utilized one convolutional layer build document representations predicting codes. However, lengths grammar text fragments, are closely related vary a lot in different documents. Therefore, flat fixed-length architecture may not be capable learning good representations....
Advances in large language models (LLMs) have empowered a variety of applications. However, there is still significant gap research when it comes to understanding and enhancing the capabilities LLMs field mental health. In this work, we present comprehensive evaluation multiple on various health prediction tasks via online text data, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, GPT-4. We conduct broad range experiments, covering zero-shot prompting, few-shot instruction fine-tuning. The...
Recent advancements in artificial intelligence, such as GPT-3.5 Turbo (OpenAI) and GPT-4, have demonstrated significant potential by achieving good scores on text-only United States Medical Licensing Examination (USMLE) exams effectively answering questions from physicians. However, the ability of these models to interpret medical images remains underexplored. This study aimed comprehensively evaluate performance, interpretability, limitations Turbo, its successor, GPT-4 Vision (GPT-4V),...
The effects of phosphorothioate (S-oligonucleotide) or terminal phosphorothioate-phosphodiester (S-O-oligonucleotides) methylphosphonate-phosphodiester (MP-O-oligonucleotides) modifications on mouse spleen cell surface binding, uptake, and degradation were studied using fluorescein (FITC)-conjugated oligonucleotides. S-oligonucleotides had the highest binding followed by S-O-, O-, MP-O-oligonucleotides. Competition studies indicated that have an increased affinity for membrane...
Abstract Deep learning transformer-based models using longitudinal electronic health records (EHRs) have shown a great success in prediction of clinical diseases or outcomes. Pretraining on large dataset can help such map the input space better and boost their performance relevant tasks through finetuning with limited data. In this study, we present TransformEHR, generative encoder-decoder model transformer that is pretrained new pretraining objective—predicting all outcomes patient at...
Importance Social determinants of health (SDOHs) are known to be associated with increased risk suicidal behaviors, but few studies use SDOHs from unstructured electronic record notes. Objective To investigate associations between veterans’ death by suicide and recent SDOHs, identified using structured data. Design, Setting, Participants This nested case-control study included veterans who received care under the US Veterans Health Administration October 1, 2010, September 30, 2015. A...
Advances in large language models (LLMs) have empowered a variety of applications. However, there is still significant gap research when it comes to understanding and enhancing the capabilities LLMs field mental health. In this work, we present first comprehensive evaluation multiple LLMs, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, GPT-4, on various health prediction tasks via online text data. We conduct broad range experiments, covering zero-shot prompting, few-shot instruction...
Objective: To develop methods that automatically map abbreviations to their full forms in biomedical articles. Methods: The authors developed two of mapping defined and undefined (defined are paired with the articles, whereas ones not). For abbreviations, they a set pattern-matching rules an abbreviation its form implemented into software program, AbbRE (for "abbreviation recognition extraction"). Using opinions domain experts as reference standard, evaluated recall precision for ten...
Abstract Biomedical texts can be typically represented by four rhetorical categories: Introduction, Methods, Results and Discussion (IMRAD). Classifying sentences into these categories benefit many other text-mining tasks. Although studies have applied different approaches for automatically classifying in MEDLINE abstracts the IMRAD categories, few explored classification of that appear full-text biomedical articles. We first evaluated whether articles could reliably annotated format then...
Objective Negation is a linguistic phenomenon that marks the absence of an entity or event.Negated events are frequently reported in both biological literature and clinical notes.Text mining applications benefit from detection negation its scope.However, due to complexity language, identifying scope sentence not trivial task.Design Conditional random fields (CRF), supervised machine-learning algorithm, were used train models detect cue phrases their notes.The trained on publicly available...
Identification of discourse relations, such as causal and contrastive between situations mentioned in text is an important task for biomedical text-mining. A corpus annotated with relations would be very useful developing evaluating methods processing. However, little effort has been made to develop resource.We have developed the Biomedical Discourse Relation Bank (BioDRB), which we explicit implicit 24 open-access full-text articles from GENIA corpus. Guidelines annotation were adapted Penn...
Medication and adverse drug event (ADE) information extracted from electronic health record (EHR) notes can be a rich resource for safety surveillance. Existing observational studies have mainly relied on structured EHR data to obtain ADE information; however, ADEs are often buried in the narratives not recorded data.To unlock ADE-related narratives, there is need extract relevant entities identify relations among them. In this study, we focus relation identification. This study aimed...
Pharmacovigilance and drug-safety surveillance are crucial for monitoring adverse drug events (ADEs), but the main ADE-reporting systems such as Food Drug Administration Adverse Event Reporting System face challenges underreporting. Therefore, complementary surveillance, data on ADEs extracted from electronic health record (EHR) notes via natural language processing (NLP). As NLP develops, many up-to-date machine-learning techniques introduced in this field, deep learning multi-task (MTL)....