Chirag Nagpal

ORCID: 0000-0003-2212-5392
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Statistical Methods and Inference
  • Topic Modeling
  • Machine Learning in Healthcare
  • Explainable Artificial Intelligence (XAI)
  • Spam and Phishing Detection
  • Advanced Causal Inference Techniques
  • Artificial Intelligence in Healthcare and Education
  • Gaussian Processes and Bayesian Inference
  • Natural Language Processing Techniques
  • Web Data Mining and Analysis
  • Bayesian Methods and Mixture Models
  • Global Security and Public Health
  • Network Security and Intrusion Detection
  • Speech and Audio Processing
  • Speech Recognition and Synthesis
  • Viral Infections and Outbreaks Research
  • Frailty in Older Adults
  • Advanced Text Analysis Techniques
  • Insurance, Mortality, Demography, Risk Management
  • Data Quality and Management
  • Plant and soil sciences
  • Data Analysis with R
  • Anomaly Detection Techniques and Applications
  • Auditing, Earnings Management, Governance
  • Digital Radiography and Breast Imaging

Google (United States)
2024-2025

Duke University
2024

DeepMind (United Kingdom)
2024

Carnegie Mellon University
2015-2024

National Heart Lung and Blood Institute
2022

University of Pittsburgh
2017

We describe a new approach to estimating relative risks in time-to-event prediction problems with censored data fully parametric manner. Our does not require making strong assumptions of constant proportional hazards the underlying survival distribution, as required by Cox-proportional hazard model. By jointly learning deep nonlinear representations input covariates, we demonstrate benefits our when used estimate through extensive experimentation on multiple real world datasets different...

10.1109/jbhi.2021.3052441 article EN IEEE Journal of Biomedical and Health Informatics 2021-01-25

Estimation of treatment efficacy real-world clinical interventions involves working with continuous outcomes such as time-to-death, re-hospitalization, or a composite event that may be subject to censoring. Counterfactual reasoning in scenarios requires decoupling the effects confounding physiological characteristics affect baseline survival rates from being assessed. In this paper, we present latent variable approach model heterogeneous by proposing an individual can belong one clusters...

10.1145/3534678.3539110 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022-08-12

Concept erasure techniques have recently gained significant attention for their potential to remove unwanted concepts from text-to-image models. While these methods often demonstrate success in controlled scenarios, robustness real-world applications and readiness deployment remain uncertain. In this work, we identify a critical gap evaluating sanitized models, particularly terms of performance across various concept dimensions. We systematically investigate the failure modes current...

10.48550/arxiv.2501.09833 preprint EN arXiv (Cornell University) 2025-01-16

10.1109/icassp49660.2025.10888006 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

With growing application of machine learning (ML) technologies in healthcare, there have been calls for developing techniques to understand and mitigate biases these systems may exhibit.Fairness considerations the development ML-based solutions health particular implications Africa, which already faces inequitable power imbalances between Global North South.This paper seeks explore fairness global health, with Africa as a case study.We conduct scoping review propose axes disparities...

10.1145/3689904.3694708 article EN 2024-10-23

Survival analysis is a challenging variation of regression modeling because the presence censoring, where outcome measurement only partially known, due to, for example, loss to follow up. Such problems come up frequently in medical applications, making survival key endeavor biostatistics and machine learning healthcare, with Cox models being amongst most commonly employed models. We describe new approach models, based on mixtures regressions model individual distributions. propose an...

10.48550/arxiv.2101.06536 preprint EN other-oa arXiv (Cornell University) 2021-01-01

<title>Abstract</title> Survival regression models can achieve longer warning times at similar receiver operating characteristic performance than previously investigated models. are also shown to predict the time until a disruption will occur with lower error other predictors. Time-to-event predictions from time-series data be obtained survival analysis statistical framework, and there have been many tools developed for this task which we aim apply prediction. Using open-source...

10.21203/rs.3.rs-3918792/v1 preprint EN cc-by Research Square (Research Square) 2024-02-08

Human trafficking is a challenging law enforcement problem, and traces of victims such activity manifest as 'escort advertisements' on various online forums. Given the large, heterogeneous noisy structure this data, building models to predict instances convoluted task. In paper we propose an entity resolution pipeline using notion proxy labels, in order extract clusters from data with prior history human activity. We apply 5M records backpage.com report performance approach, challenges terms...

10.18653/v1/w17-4411 article EN cc-by 2017-01-01

The dearth of prescribing guidelines for physicians is one key driver the current opioid epidemic in United States. In this work, we analyze medical and pharmaceutical claims data to draw insights on characteristics patients who are more prone adverse outcomes after an initial synthetic prescription. Toward end, propose a generative model that allows discovery from observational subgroups demonstrate enhanced or diminished causal effect due treatment. Our approach models these...

10.1145/3368555.3384456 preprint EN 2020-03-20

We describe a new approach to estimating relative risks in time-to-event prediction problems with censored data fully parametric manner. Our does not require making strong assumptions of constant proportional hazard the underlying survival distribution, as required by Cox-proportional model. By jointly learning deep nonlinear representations input covariates, we demonstrate benefits our when used estimate through extensive experimentation on multiple real world datasets different levels...

10.48550/arxiv.2003.01176 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Applications of machine learning in healthcare often require working with time-to-event prediction tasks including prognostication an adverse event, re-hospitalization or death. Such outcomes are typically subject to censoring due loss follow up. Standard methods cannot be applied a straightforward manner datasets censored outcomes. In this paper, we present auton-survival, open-source repository tools streamline survival data. auton-survival includes for regression, adjustment the presence...

10.48550/arxiv.2204.07276 preprint EN other-oa arXiv (Cornell University) 2022-01-01

A simple and effective method for the alignment of generative models is best-of-$n$ policy, where $n$ samples are drawn from a base ranked based on reward function, highest ranking one selected. commonly used analytical expression in literature claims that KL divergence between policy equal to $\log (n) - (n-1)/n.$ We disprove validity this claim, show it an upper bound actual divergence. also explore tightness different regimes. Finally, we propose new estimator empirically provides tight...

10.48550/arxiv.2401.01879 preprint EN cc-by arXiv (Cornell University) 2024-01-01

Language model (LM) post-training (or alignment) involves maximizing a reward function that is derived from preference annotations. Direct Preference Optimization (DPO) popular offline alignment method trains policy directly on data without the need to train or apply reinforcement learning. However, typical datasets have only single, at most few, annotation per pair, which causes DPO overconfidently assign rewards trend towards infinite magnitude. This frequently leads degenerate policies,...

10.48550/arxiv.2405.19316 preprint EN arXiv (Cornell University) 2024-05-29

Semi-parametric survival analysis methods like the Cox Proportional Hazards (CPH) regression (Cox, 1972) are a popular approach for analysis. These involve fitting of log-proportional hazard as function covariates and convenient they do not require estimation baseline rate. Recent approaches have involved learning non-linear representations input demonstrate improved performance. In this paper we argue against such deep parameterizations experimentally that more interpretable semi-parametric...

10.48550/arxiv.1905.05865 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Language model alignment has become a critical step in training modern generative language models. The goal of is to finetune reference such that the win rate sample from aligned over high, subject KL divergence constraint. Today, we are increasingly using inference-time algorithms (e.g., Best-of-N, controlled decoding, tree search) decode models rather than standard sampling. However, objective does not capture decoding procedures. We show existing framework sub-optimal view methods. then...

10.48550/arxiv.2412.19792 preprint EN arXiv (Cornell University) 2024-12-27

A common approach for aligning language models to human preferences is first learn a reward model from preference data, and then use this update the model. We study two closely related problems that arise in approach. First, any monotone transformation of preserves ranking; there choice ``better'' than others? Second, we often wish align multiple properties: how should combine models? Using probabilistic interpretation alignment procedure, identify natural (the case of) rewards learned...

10.48550/arxiv.2402.00742 preprint EN arXiv (Cornell University) 2024-02-01

Bias benchmarks are a popular method for studying the negative impacts of bias in LLMs, yet there has been little empirical investigation whether these actually indicative how real world harm may manifest world. In this work, we study correspondence between such decontextualized "trick tests" and evaluations that more grounded Realistic Use Tangible {Effects (i.e. RUTEd evaluations). We explore correlation context gender-occupation bias--a genre evaluation. compare three de-contextualized...

10.48550/arxiv.2402.12649 preprint EN arXiv (Cornell University) 2024-02-19

BACKGROUND: Intraoperative physiologic parameters could offer predictive utility in evaluating risk of adverse postoperative events yet are not included current standard models.This study examines if the inclusion continuous intraoperative data improves machine learning model predictions for multiple outcomes following coronary artery bypass grafting, including: 30-day mortality, renal failure, reoperation, prolonged ventilation, and combined morbidity mortality (MM). METHODS: Society...

10.1016/j.atssr.2024.02.005 article EN cc-by-nc-nd Annals of Thoracic Surgery Short Reports 2024-03-07

With growing application of machine learning (ML) technologies in healthcare, there have been calls for developing techniques to understand and mitigate biases these systems may exhibit. Fair-ness considerations the development ML-based solutions health particular implications Africa, which already faces inequitable power imbalances between Global North South.This paper seeks explore fairness global health, with Africa as a case study. We conduct scoping review propose axes disparities...

10.48550/arxiv.2403.03357 preprint EN arXiv (Cornell University) 2024-03-05

Large language models (LLMs) hold immense promise to serve complex health information needs but also have the potential introduce harm and exacerbate disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote equity. In this work, we present resources methodologies for surfacing biases with precipitate harms in long-form, LLM-generated answers medical questions then conduct an empirical case study Med-PaLM 2, resulting largest...

10.48550/arxiv.2403.12025 preprint EN arXiv (Cornell University) 2024-03-18

Deep learning has recently demonstrated the ability to predict long-term patient risk and its stratification when trained on imaging data such as chest radiographs. However, existing methods formulate estimating a binary classification, typically ignoring or limiting use of temporal information, not accounting for loss follow-up, which reduces fidelity estimation limits prediction certain time horizon. In this paper, we demonstrate that deep survival time-to-event models can outperform...

10.3390/forecast6020022 article EN cc-by Forecasting 2024-05-26
Coming Soon ...