NFDI4DS | UHH-SEMS - Publication Details

Thomas Hartvigsen

ORCID: 0000-0002-5288-2792

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5075881948

Research Areas

Time Series Analysis and Forecasting
Topic Modeling
Machine Learning in Healthcare
Natural Language Processing Techniques
Anomaly Detection Techniques and Applications
Explainable Artificial Intelligence (XAI)
Text and Document Classification Technologies
Machine Learning and Data Classification
Artificial Intelligence in Healthcare and Education
EEG and Brain-Computer Interfaces
Imbalanced Data Classification Techniques
Artificial Intelligence in Healthcare
Semantic Web and Ontologies
Stock Market Forecasting Methods
Biomedical Text Mining and Ontologies
Text Readability and Simplification
Neural dynamics and brain function
COVID-19 diagnosis using AI
Clostridium difficile and Clostridium perfringens research
Hate Speech and Cyberbullying Detection
Neural Networks and Applications
Intelligent Tutoring Systems and Adaptive Learning
Digital Radiography and Breast Imaging
Advanced Memory and Neural Computing
Machine Learning and ELM

University of Virginia
2023-2025

Massachusetts Institute of Technology
2022-2023

IIT@MIT
2023

Microsoft (United States)
2022

Allen Institute
2022

Carnegie Mellon University
2022

Worcester Polytechnic Institute
2017-2021

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

OPENALEX - Publications

Thomas Hartvigsen Saadia Gabriel Hamid Palangi Maarten Sap Dipankar Ray and 1 more

Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, Ece Kamar. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.234 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Demographic bias in misdiagnosis by computational pathology models

OPENALEX - Publications

Anurag Vaidya Richard J. Chen Drew F. K. Williamson Andrew H. Song Guillaume Jaume and 8 more

10.1038/s41591-024-02885-z article EN Nature Medicine 2024-04-01

The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations

OPENALEX - Publications

Aparna Balagopalan Haoran Zhang Kimia Hamidieh Thomas Hartvigsen Frank Rudzicz and 1 more

Machine learning models in safety-critical settings like healthcare are often blackboxes: they contain a large number of parameters which not transparent to users. Post-hoc explainability methods where simple, human-interpretable model imitates the behavior these blackbox proposed help users trust predictions. In this work, we audit quality such explanations for different protected subgroups using real data from four finance, healthcare, college admissions, and US justice system. Across two...

10.1145/3531146.3533179 article EN 2022 ACM Conference on Fairness, Accountability, and Transparency 2022-06-20

Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words?

OPENALEX - Publications

Cansu Şen Thomas Hartvigsen Biao Yin Xiangnan Kong Elke A. Rundensteiner

Motivated by human attention, computational attention mechanisms have been designed to help neural networks adjust their focus on specific parts of the input data. While are claimed achieve interpretability, little is known about actual relationships between machine and attention. In this work, we conduct first quantitative assessment versus for text classification task. To this, design a large-scale crowd-sourcing study collect maps that encode humans when conducting classification. Based...

10.18653/v1/2020.acl-main.419 article EN cc-by 2020-01-01

Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks

OPENALEX - Publications

Jack Gallifant Shan Chen Pedro Moreira Nikolaj Munch Mingye Gao and 5 more

10.18653/v1/2024.findings-emnlp.726 article EN 2024-01-01

Abstract IA04: Evaluating and mitigating medical misinformation risk in large language models

OPENALEX - Publications

Shan Chen Mingye Gao Kuleen Sasse Thomas Hartvigsen Brian W. Anthony and 3 more

Abstract Background: Large language models (LLMs) are increasingly used to generate medical content, yet their inherent design follow user instructions may leave them vulnerable producing misinformation. This risk becomes especially pronounced when LLMs incorrect information that could adversely affect human health. A propensity comply with prompts, even these lead illogical or false information, highlights a critical gap in safety, high-stakes fields like healthcare. Methods: We evaluated...

10.1158/1557-3265.targetedtherap-ia04 article EN Clinical Cancer Research 2025-01-26

Sparse Autoencoder Features for Classifications and Transferability

OPENALEX - Publications

Jack Gallifant Shan Chen Kuleen Sasse Hugo J.W.L. Aerts Thomas Hartvigsen and 1 more

Sparse Autoencoders (SAEs) provide potentials for uncovering structured, human-interpretable representations in Large Language Models (LLMs), making them a crucial tool transparent and controllable AI systems. We systematically analyze SAE interpretable feature extraction from LLMs safety-critical classification tasks. Our framework evaluates (1) model-layer selection scaling properties, (2) architectural configurations, including width pooling strategies, (3) the effect of binarizing...

10.48550/arxiv.2502.11367 preprint EN arXiv (Cornell University) 2025-02-16

Adaptive-Halting Policy Network for Early Classification

OPENALEX - Publications

Thomas Hartvigsen Cansu Şen Xiangnan Kong Elke A. Rundensteiner

Early classification of time series is the prediction class label a before it observed in its entirety. In time-sensitive domains where information collected over worth sacrificing some accuracy favor earlier predictions, ideally early enough for actions to be taken. However, since and earliness are contradictory objectives, solution must address this challenge discover task-dependent trade-offs. We design an model, called EARLIEST, which tackles multi-objective optimization problem, jointly...

10.1145/3292500.3330974 article EN 2019-07-25

Are Language Models Actually Useful for Time Series Forecasting?

OPENALEX - Publications

Mingtian Tan Mike A. Merrill Vinayak Gupta Tim Althoff Thomas Hartvigsen

Large language models (LLMs) are being applied to time series tasks, particularly forecasting. However, actually useful for series? After a of ablation studies on three recent and popular LLM-based forecasting methods, we find that removing the LLM component or replacing it with basic attention layer does not degrade results -- in most cases even improved. We also despite their significant computational cost, pretrained LLMs do no better than trained from scratch, represent sequential...

10.48550/arxiv.2406.16964 preprint EN arXiv (Cornell University) 2024-06-21

Interpretable Unified Language Checking

OPENALEX - Publications

Tianhua Zhang Hongyin Luo Yung-Sung Chuang Wei Fang Luc Gaitskell and 5 more

Despite recent concerns about undesirable behaviors generated by large language models (LLMs), including non-factual, biased, and hateful language, we find LLMs are inherent multi-task checkers based on their latent representations of natural social knowledge. We present an interpretable, unified, checking (UniLC) method for both human machine-generated that aims to check if input is factual fair. While fairness fact-checking tasks have been handled separately with dedicated models, can...

10.48550/arxiv.2304.03728 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Taking Off with AI: Lessons from Aviation for Healthcare

OPENALEX - Publications

Elizabeth Bondi Thomas Hartvigsen Lindsay Sanneman Swami Sankaranarayanan Zach Harned and 7 more

Artificial intelligence (AI) stands to improve healthcare through innovative new systems ranging from diagnosis aids patient tools. However, such "Health AI" are complicated and challenging integrate into standing clinical practice. With advancing AI, regulations, practice, policies must adapt a wide range of risks while experts learn interact with complex automated systems. Even in the early stages Health gaps being identified, like severe underperformance models for minority groups...

10.1145/3617694.3623224 article EN cc-by 2023-10-29

Recurrent Halting Chain for Early Multi-label Classification

OPENALEX - Publications

Thomas Hartvigsen Cansu Şen Xiangnan Kong Elke A. Rundensteiner

Early multi-label classification of time series, the assignment a label set to series before is entirely observed, critical for time-sensitive domains such as healthcare. In cases, waiting too long classify can render predictions useless, regardless their accuracy, while predicting prematurely result in potentially costly erroneous results. When multiple labels (for example, types infections), dependencies between be learned and leveraged improve overall accuracy. Together, reliably correct...

10.1145/3394486.3403191 article EN public-domain 2020-08-20

Learning Saliency Maps to Explain Deep Time Series Classifiers

OPENALEX - Publications

Prathyush Parvatharaju Ramesh Doddaiah Thomas Hartvigsen Elke A. Rundensteiner

Explainable classification is essential to high-impact settings where practitioners requireevidence support their decisions. However, state-of-the-art deep learning models lack transparency in how they make predictions. One increasingly popular solution attribution-based explainability, which finds the impact of input features on model's While this for computer vision, little has been done explain time series classifiers.In work, we study problem and propose PERT, a novel perturbation-based...

10.1145/3459637.3482446 article EN 2021-10-26

Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors

OPENALEX - Publications

Thomas Hartvigsen Swami Sankaranarayanan Hamid Palangi Yoon Kim Marzyeh Ghassemi

Deployed language models decay over time due to shifting inputs, changing user needs, or emergent world-knowledge gaps. When such problems are identified, we want make targeted edits while avoiding expensive retraining. However, current model editors, which modify behaviors of pre-trained models, degrade performance quickly across multiple, sequential edits. We propose GRACE, a lifelong editing method, implements spot-fixes on streaming errors deployed model, ensuring minimal impact...

10.48550/arxiv.2211.11031 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Early Prediction of MRSA Infections using Electronic Health Records

OPENALEX - Publications

Thomas Hartvigsen Cansu Şen Sarah Brownell Erin Teeple Xiangnan Kong and 1 more

10.5220/0006599601560167 article EN cc-by-nc-nd Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies 2018-01-01

UniTS: Building a Unified Time Series Model

OPENALEX - Publications

Shanghua Gao Teddy Koker Owen Queen Thomas Hartvigsen Theodoros Tsiligkaridis and 1 more

Foundation models, especially LLMs, are profoundly transforming deep learning. Instead of training many task-specific we can adapt a single pretrained model to tasks via fewshot prompting or fine-tuning. However, current foundation models apply sequence data but not time series, which present unique challenges due the inherent diverse and multidomain series datasets, diverging task specifications across forecasting, classification other types tasks, apparent need for task-specialized models....

10.48550/arxiv.2403.00131 preprint EN arXiv (Cornell University) 2024-02-29

FedMedICL: Towards Holistic Evaluation of Distribution Shifts in Federated Medical Imaging

OPENALEX - Publications

Kumail Alhamoud Yasir Ghunaim Motasem Alfarra Thomas Hartvigsen Philip H. S. Torr and 3 more

For medical imaging AI models to be clinically impactful, they must generalize. However, this goal is hindered by (i) diverse types of distribution shifts, such as temporal, demographic, and label (ii) limited diversity in datasets that are siloed within single institutions. While these limitations have spurred interest federated learning, current evaluation benchmarks fail evaluate different shifts simultaneously. real healthcare settings, multiple co-exist, yet their impact on performance...

10.48550/arxiv.2407.08822 preprint EN arXiv (Cornell University) 2024-07-11

Identifying Implicit Social Biases in Vision-Language Models

OPENALEX - Publications

Kimia Hamidieh Haoran Zhang Walter Gerych Thomas Hartvigsen Marzyeh Ghassemi

Vision-language models, like CLIP (Contrastive Language Image Pretraining), are becoming increasingly popular for a wide range of multimodal retrieval tasks. However, prior work has shown that large language and deep vision models can learn historical biases contained in their training sets, leading to perpetuation stereotypes potential downstream harm. In this work, we conduct systematic analysis the social present CLIP, with focus on interaction between image text modalities. We first...

10.1609/aies.v7i1.31657 article EN 2024-10-16

ProCyon: A multimodal foundation model for protein phenotypes

OPENALEX - Publications

Owen Queen Yepeng Huang Robert Calef Valentina Giunchiglia Tianlong Chen and 13 more

Understanding the roles of human proteins remains a major challenge, with approximately 20% lacking known functions and more than 40% missing context-specific functional insights. Even well-annotated are often poorly characterized in diverse biological contexts, disease states, perturbations. We present P ro C yon , foundation model for modeling, generating, predicting protein phenotypes across five interrelated knowledge domains: molecular functions, therapeutic mechanisms, associations,...

10.1101/2024.12.10.627665 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2024-12-15

MATHWELL: Generating Educational Math Word Problems at Scale

OPENALEX - Publications

Bryan R. Christ Jonathan Kropko Thomas Hartvigsen

Math word problems are critical K-8 educational tools, but writing them is time-consuming and requires domain expertise. We suggest that language models can support math education by automatically generating at scale. To be educational, generated must 1) solvable, 2) accurate, 3) appropriate. Existing datasets unlabeled for these criteria, making ill-suited training problem generators. introduce MATHWELL, a Llama-2 (70B) model iteratively finetuned to generate using data from expert...

10.48550/arxiv.2402.15861 preprint EN arXiv (Cornell University) 2024-02-24

Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks

OPENALEX - Publications

Jack Gallifant Shan Chen Pedro Moreira Nikolaj Munch Mingye Gao and 5 more

Medical knowledge is context-dependent and requires consistent reasoning across various natural language expressions of semantically equivalent phrases. This particularly crucial for drug names, where patients often use brand names like Advil or Tylenol instead their generic equivalents. To study this, we create a new robustness dataset, RABBITS, to evaluate performance differences on medical benchmarks after swapping using physician expert annotations. We assess both open-source API-based...

10.48550/arxiv.2406.12066 preprint EN arXiv (Cornell University) 2024-06-17

MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations

OPENALEX - Publications

Bryan R. Christ Jonathan Kropko Thomas Hartvigsen

10.18653/v1/2024.findings-emnlp.696 article EN 2024-01-01

Recovering the Propensity Score from Biased Positive Unlabeled Data

OPENALEX - Publications

Walter Gerych Thomas Hartvigsen Luke Buquicchio Emmanuel Agu Elke A. Rundensteiner

Positive-Unlabeled (PU) learning methods train a classifier to distinguish between the positive and negative classes given only unlabeled data. While traditional PU require labeled samples be an unbiased sample of distribution, in practice is often biased draw from true distribution. Prior work shows that if we know likelihood each instance will selected for labeling, referred as propensity score, then can used learning. Unfortunately, no prior has been proposed inference strategy which...

10.1609/aaai.v36i6.20624 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

OPENALEX - Publications

Thomas Hartvigsen Saadia Gabriel Hamid Palangi Maarten Sap Dipankar Ray and 1 more

Toxic language detection systems often falsely flag text that contains minority group mentions as toxic, those groups are the targets of online hate. Such over-reliance on spurious correlations also causes to struggle with detecting implicitly toxic language. To help mitigate these issues, we create ToxiGen, a new large-scale and machine-generated dataset 274k benign statements about 13 groups. We develop demonstration-based prompting framework an adversarial classifier-in-the-loop decoding...

10.48550/arxiv.2203.09509 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Coming Soon ...