- Topic Modeling
- Biomedical Text Mining and Ontologies
- Machine Learning in Healthcare
- Natural Language Processing Techniques
- Semantic Web and Ontologies
- Diatoms and Algae Research
- Sentiment Analysis and Opinion Mining
- Mental Health via Writing
- Constraint Satisfaction and Optimization
- Educational Assessment and Pedagogy
- Advanced Text Analysis Techniques
- Image Processing Techniques and Applications
- Web Data Mining and Analysis
- CRISPR and Genetic Engineering
- Safety Warnings and Signage
- Pharmacovigilance and Adverse Drug Reactions
- Multimodal Machine Learning Applications
- Educational Technology and Assessment
- Text Readability and Simplification
- RNA and protein synthesis mechanisms
- Caching and Content Delivery
- Industrial Vision Systems and Defect Detection
- Literature, Language, and Rhetoric Studies
- Advanced Computing and Algorithms
- Complex Network Analysis Techniques
Cedars-Sinai Medical Center
2023-2025
Harvard University
2022-2023
Boston Children's Hospital
2022-2023
University of Maryland, Baltimore
2023
Loyola University Chicago
2022
University of Arizona
2016-2021
University of Pennsylvania
2014
Princeton University
2014
Detecting depression is a key public health challenge, as almost 12% of all disabilities can be attributed to depression. Computational models for detection must prove not only that they detect depression, but do it early enough an intervention plausible. However, current evaluations are poor at measuring model latency. We identify several issues with the currently popular ERDE metric, and propose latency-weighted F1 metric addresses these concerns. then apply this evaluation from recent...
The aim of the Social Media Mining for Health Applications (#SMM4H) shared tasks is to take a community-driven approach address natural language processing and machine learning challenges inherent utilizing social media data health informatics. In this paper, we present annotated corpora, technical summary participants' systems, performance results.
Concept normalization, the task of linking textual mentions concepts to in an ontology, is challenging because ontologies are large. In most cases, annotated datasets cover only a small sample concepts, yet concept normalizers expected predict all ontology. this paper, we propose architecture consisting candidate generator and list-wise ranker based on BERT. The considers pairings allowing it make predictions for any concept, not just those seen during training. We further enhance approach...
Influenza vaccine effectiveness (VE) estimation plays a critical role in public health decision-making by quantifying the real-world impact of vaccination campaigns and guiding policy adjustments. Current approaches to VE are constrained limited population representation, selection bias, delayed reporting. To address some these gaps, we propose leveraging large language models (LLMs) with few-shot chain-of-thought (CoT) prompting mine social media data for real-time influenza estimation. We...
Abstract Background Taxonomic descriptions are traditionally composed in natural language and published a format that cannot be directly used by computers. The Exploring Taxon Concepts (ETC) project has been developing set of web-based software tools convert morphological telegraphic style to character data can reused repurposed. This paper introduces the first semi-automated pipeline, our knowledge, converts into taxon-character matrices support systematics evolutionary biology research. We...
This paper presents the first model for time normalization trained on SCATE corpus. In schema, expressions are annotated as a semantic composition of entities. novel schema favors machine learning approaches, it can be viewed parsing task. this work, we propose character level multi-output neural network that outperforms previous state-of-the-art built TimeML schema. To compare predictions systems follow both and TimeML, present new scoring metric intervals. We also apply to carry out...
Concept normalization, the task of linking phrases in text to concepts an ontology, is useful for many downstream tasks including relation extraction, information retrieval, etc. We present a generate-and-rank concept normalization system based on our participation 2019 National NLP Clinical Challenges Shared Task Track 3 Normalization.
The aim of the Social Media Mining for Health Applications (#SMM4H) shared tasks is to take a community-driven approach address natural language processing and machine learning challenges inherent utilizing social media data health informatics. eighth iteration #SMM4H was hosted at AMIA 2023 Annual Symposium consisted five that represented various platforms (Twitter Reddit), languages (English Spanish), methods (binary classification, multi-class extraction, normalization), topics (COVID-19,...
This paper presents the outcomes of Parsing Time Normalization shared task held within SemEval-2018. The aim is to parse time expressions into compositional semantic graphs Semantically Compositional Annotation Expressions (SCATE) schema, which allows representation a wider variety than previous approaches. Two tracks were included, one evaluate parsing individual components produced graphs, in classic information extraction way, and another quality intervals resulting from interpretation...
Automatically summarizing patients' main problems from daily progress notes using natural language processing methods helps to battle against information and cognitive overload in hospital settings potentially assists providers with computerized diagnostic decision support. Problem list summarization requires a model understand, abstract, generate clinical documentation. In this work, we propose new NLP task that aims of patient's care plan input the provider's during hospitalization. We...
Prior work has demonstrated that question classification (QC), recognizing the problem domain of a question, can help answer it more accurately. However, developing strong QC algorithms been hindered by limited size and complexity annotated data available. To address this, we present largest challenge dataset for QC, containing 7,787 science exam questions paired with detailed labels from fine-grained hierarchical taxonomy 406 domains. We then show BERT-based model trained on this achieves...
This study presents the outcomes of shared task competition BioCreative VII (Task 3) focusing on extraction medication names from a Twitter user's publicly available tweets (the 'timeline'). In general, detecting health-related is notoriously challenging for natural language processing tools. The main challenge, aside informality used, that people tweet about any and all topics, most their are not related to health. Thus, finding those in timeline mention specific concepts such as...
Recent studies have shown that pre-trained contextual word embeddings, which assign the same different vectors in contexts, improve performance many tasks. But while embeddings can also be trained at character level, effectiveness of such has not been studied. We derive character-level from Flair (Akbik et al., 2018), and apply them to a time normalization task, yielding major improvements over previous state-of-the-art: 51% error reduction news 33% clinical notes. analyze sources these...
Concept normalization, the task of linking textual mentions concepts to in an ontology, is critical for mining and analyzing biomedical texts. We propose a vector-space model concept where are encoded via transformer networks that trained triplet objective with online hard mining. The refine existing pre-trained models, makes training efficient even hundreds thousands by sampling triples within each mini-batch. introduce variety strategies searching model, including approaches incorporate...
Scholarly publications of biodiversity literature contain a vast amount information in human readable format. The detailed morphological descriptions these rich that can be extracted to facilitate analysis and computational biology research. However, the idiosyncrasies still pose number challenges machines. In this work, we demonstrate use two different approaches resolve meronym (i.e. part-of) relations between anatomical parts their anchor organs, including syntactic rule-based approach...
In this paper, we present our work participating in the BioCreative VII Track 3 - automatic extraction of medication names tweets, where implemented a multi-task learning model that is jointly trained on text classification and sequence labelling. Our best system run achieved strict F1 80.4, ranking first more than 10 points higher average score all participants. analyses show ensemble technique, learning, data augmentation are beneficial for detection tweets.