Thomas Hartvigsen

ORCID: 0000-0002-5288-2792
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Time Series Analysis and Forecasting
  • Topic Modeling
  • Machine Learning in Healthcare
  • Natural Language Processing Techniques
  • Anomaly Detection Techniques and Applications
  • Explainable Artificial Intelligence (XAI)
  • Text and Document Classification Technologies
  • Machine Learning and Data Classification
  • Artificial Intelligence in Healthcare and Education
  • EEG and Brain-Computer Interfaces
  • Imbalanced Data Classification Techniques
  • Artificial Intelligence in Healthcare
  • Semantic Web and Ontologies
  • Stock Market Forecasting Methods
  • Biomedical Text Mining and Ontologies
  • Text Readability and Simplification
  • Neural dynamics and brain function
  • COVID-19 diagnosis using AI
  • Clostridium difficile and Clostridium perfringens research
  • Hate Speech and Cyberbullying Detection
  • Neural Networks and Applications
  • Intelligent Tutoring Systems and Adaptive Learning
  • Digital Radiography and Breast Imaging
  • Advanced Memory and Neural Computing
  • Machine Learning and ELM

University of Virginia
2023-2025

Massachusetts Institute of Technology
2022-2023

IIT@MIT
2023

Microsoft (United States)
2022

Allen Institute
2022

Carnegie Mellon University
2022

Worcester Polytechnic Institute
2017-2021

Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, Ece Kamar. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.234 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Machine learning models in safety-critical settings like healthcare are often blackboxes: they contain a large number of parameters which not transparent to users. Post-hoc explainability methods where simple, human-interpretable model imitates the behavior these blackbox proposed help users trust predictions. In this work, we audit quality such explanations for different protected subgroups using real data from four finance, healthcare, college admissions, and US justice system. Across two...

10.1145/3531146.3533179 article EN 2022 ACM Conference on Fairness, Accountability, and Transparency 2022-06-20

Motivated by human attention, computational attention mechanisms have been designed to help neural networks adjust their focus on specific parts of the input data. While are claimed achieve interpretability, little is known about actual relationships between machine and attention. In this work, we conduct first quantitative assessment versus for text classification task. To this, design a large-scale crowd-sourcing study collect maps that encode humans when conducting classification. Based...

10.18653/v1/2020.acl-main.419 article EN cc-by 2020-01-01

Abstract Background: Large language models (LLMs) are increasingly used to generate medical content, yet their inherent design follow user instructions may leave them vulnerable producing misinformation. This risk becomes especially pronounced when LLMs incorrect information that could adversely affect human health. A propensity comply with prompts, even these lead illogical or false information, highlights a critical gap in safety, high-stakes fields like healthcare. Methods: We evaluated...

10.1158/1557-3265.targetedtherap-ia04 article EN Clinical Cancer Research 2025-01-26

Sparse Autoencoders (SAEs) provide potentials for uncovering structured, human-interpretable representations in Large Language Models (LLMs), making them a crucial tool transparent and controllable AI systems. We systematically analyze SAE interpretable feature extraction from LLMs safety-critical classification tasks. Our framework evaluates (1) model-layer selection scaling properties, (2) architectural configurations, including width pooling strategies, (3) the effect of binarizing...

10.48550/arxiv.2502.11367 preprint EN arXiv (Cornell University) 2025-02-16

Early classification of time series is the prediction class label a before it observed in its entirety. In time-sensitive domains where information collected over worth sacrificing some accuracy favor earlier predictions, ideally early enough for actions to be taken. However, since and earliness are contradictory objectives, solution must address this challenge discover task-dependent trade-offs. We design an model, called EARLIEST, which tackles multi-objective optimization problem, jointly...

10.1145/3292500.3330974 article EN 2019-07-25

Large language models (LLMs) are being applied to time series tasks, particularly forecasting. However, actually useful for series? After a of ablation studies on three recent and popular LLM-based forecasting methods, we find that removing the LLM component or replacing it with basic attention layer does not degrade results -- in most cases even improved. We also despite their significant computational cost, pretrained LLMs do no better than trained from scratch, represent sequential...

10.48550/arxiv.2406.16964 preprint EN arXiv (Cornell University) 2024-06-21

Despite recent concerns about undesirable behaviors generated by large language models (LLMs), including non-factual, biased, and hateful language, we find LLMs are inherent multi-task checkers based on their latent representations of natural social knowledge. We present an interpretable, unified, checking (UniLC) method for both human machine-generated that aims to check if input is factual fair. While fairness fact-checking tasks have been handled separately with dedicated models, can...

10.48550/arxiv.2304.03728 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Artificial intelligence (AI) stands to improve healthcare through innovative new systems ranging from diagnosis aids patient tools. However, such "Health AI" are complicated and challenging integrate into standing clinical practice. With advancing AI, regulations, practice, policies must adapt a wide range of risks while experts learn interact with complex automated systems. Even in the early stages Health gaps being identified, like severe underperformance models for minority groups...

10.1145/3617694.3623224 article EN cc-by 2023-10-29

Early multi-label classification of time series, the assignment a label set to series before is entirely observed, critical for time-sensitive domains such as healthcare. In cases, waiting too long classify can render predictions useless, regardless their accuracy, while predicting prematurely result in potentially costly erroneous results. When multiple labels (for example, types infections), dependencies between be learned and leveraged improve overall accuracy. Together, reliably correct...

10.1145/3394486.3403191 article EN public-domain 2020-08-20

Explainable classification is essential to high-impact settings where practitioners requireevidence support their decisions. However, state-of-the-art deep learning models lack transparency in how they make predictions. One increasingly popular solution attribution-based explainability, which finds the impact of input features on model's While this for computer vision, little has been done explain time series classifiers.In work, we study problem and propose PERT, a novel perturbation-based...

10.1145/3459637.3482446 article EN 2021-10-26

Deployed language models decay over time due to shifting inputs, changing user needs, or emergent world-knowledge gaps. When such problems are identified, we want make targeted edits while avoiding expensive retraining. However, current model editors, which modify behaviors of pre-trained models, degrade performance quickly across multiple, sequential edits. We propose GRACE, a lifelong editing method, implements spot-fixes on streaming errors deployed model, ensuring minimal impact...

10.48550/arxiv.2211.11031 preprint EN cc-by arXiv (Cornell University) 2022-01-01

10.5220/0006599601560167 article EN cc-by-nc-nd Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies 2018-01-01

Foundation models, especially LLMs, are profoundly transforming deep learning. Instead of training many task-specific we can adapt a single pretrained model to tasks via fewshot prompting or fine-tuning. However, current foundation models apply sequence data but not time series, which present unique challenges due the inherent diverse and multidomain series datasets, diverging task specifications across forecasting, classification other types tasks, apparent need for task-specialized models....

10.48550/arxiv.2403.00131 preprint EN arXiv (Cornell University) 2024-02-29

For medical imaging AI models to be clinically impactful, they must generalize. However, this goal is hindered by (i) diverse types of distribution shifts, such as temporal, demographic, and label (ii) limited diversity in datasets that are siloed within single institutions. While these limitations have spurred interest federated learning, current evaluation benchmarks fail evaluate different shifts simultaneously. real healthcare settings, multiple co-exist, yet their impact on performance...

10.48550/arxiv.2407.08822 preprint EN arXiv (Cornell University) 2024-07-11

Vision-language models, like CLIP (Contrastive Language Image Pretraining), are becoming increasingly popular for a wide range of multimodal retrieval tasks. However, prior work has shown that large language and deep vision models can learn historical biases contained in their training sets, leading to perpetuation stereotypes potential downstream harm. In this work, we conduct systematic analysis the social present CLIP, with focus on interaction between image text modalities. We first...

10.1609/aies.v7i1.31657 article EN 2024-10-16

Understanding the roles of human proteins remains a major challenge, with approximately 20% lacking known functions and more than 40% missing context-specific functional insights. Even well-annotated are often poorly characterized in diverse biological contexts, disease states, perturbations. We present P ro C yon , foundation model for modeling, generating, predicting protein phenotypes across five interrelated knowledge domains: molecular functions, therapeutic mechanisms, associations,...

10.1101/2024.12.10.627665 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2024-12-15

Math word problems are critical K-8 educational tools, but writing them is time-consuming and requires domain expertise. We suggest that language models can support math education by automatically generating at scale. To be educational, generated must 1) solvable, 2) accurate, 3) appropriate. Existing datasets unlabeled for these criteria, making ill-suited training problem generators. introduce MATHWELL, a Llama-2 (70B) model iteratively finetuned to generate using data from expert...

10.48550/arxiv.2402.15861 preprint EN arXiv (Cornell University) 2024-02-24

Medical knowledge is context-dependent and requires consistent reasoning across various natural language expressions of semantically equivalent phrases. This particularly crucial for drug names, where patients often use brand names like Advil or Tylenol instead their generic equivalents. To study this, we create a new robustness dataset, RABBITS, to evaluate performance differences on medical benchmarks after swapping using physician expert annotations. We assess both open-source API-based...

10.48550/arxiv.2406.12066 preprint EN arXiv (Cornell University) 2024-06-17

Positive-Unlabeled (PU) learning methods train a classifier to distinguish between the positive and negative classes given only unlabeled data. While traditional PU require labeled samples be an unbiased sample of distribution, in practice is often biased draw from true distribution. Prior work shows that if we know likelihood each instance will selected for labeling, referred as propensity score, then can used learning. Unfortunately, no prior has been proposed inference strategy which...

10.1609/aaai.v36i6.20624 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Toxic language detection systems often falsely flag text that contains minority group mentions as toxic, those groups are the targets of online hate. Such over-reliance on spurious correlations also causes to struggle with detecting implicitly toxic language. To help mitigate these issues, we create ToxiGen, a new large-scale and machine-generated dataset 274k benign statements about 13 groups. We develop demonstration-based prompting framework an adversarial classifier-in-the-loop decoding...

10.48550/arxiv.2203.09509 preprint EN cc-by arXiv (Cornell University) 2022-01-01
Coming Soon ...