NFDI4DS | UHH-SEMS - Publication Details

Debajyoti Datta

ORCID: 0000-0003-0581-6116

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5103139732

Research Areas

Topic Modeling
Natural Language Processing Techniques
Intelligent Tutoring Systems and Adaptive Learning
Biomedical Text Mining and Ontologies
AI in Service Interactions
Cancer Genomics and Diagnostics
Machine Learning and Data Classification
Machine Learning in Healthcare
Domain Adaptation and Few-Shot Learning
Mobile Crowdsensing and Crowdsourcing
Health Literacy and Information Accessibility
Human Mobility and Location-Based Analysis
COVID-19 epidemiological studies
Artificial Intelligence in Healthcare and Education
Multimodal Machine Learning Applications
Mental Health Research Topics
Educational Games and Gamification
Speech and dialogue systems
Data Stream Mining Techniques
Text Readability and Simplification
Cultural Competency in Health Care
Electronic Health Records Systems
Emotion and Mood Recognition
Digital Mental Health Interventions
Humor Studies and Applications

University of California, San Francisco
2019-2024

University of Virginia
2016-2023

Engineering Systems (United States)
2021

City College of San Francisco
2017

Centre National de la Recherche Scientifique
2014

Télécom Paris
2014

Laboratoire Traitement et Communication de l’Information
2014

Multitask Prompted Training Enables Zero-Shot Task Generalization

OPENALEX - Publications

Victor Sanh Albert Webson Colin Raffel Stephen H. Bach Lintang Sutawika and 37 more

Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has hypothesized that this is consequence implicit multitask learning in models' pretraining (Radford 2019). Can instead be directly induced by explicit learning? To test question at scale, we develop system for easily mapping any natural into human-readable prompted form. We convert large supervised datasets, each with multiple prompts wording....

10.48550/arxiv.2110.08207 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Polaris: A Safety-focused LLM Constellation Architecture for Healthcare

OPENALEX - Publications

Subhabrata Mukherjee Paul Gamble Markel Sanz Ausin Neel Kant Kriti Aggarwal and 21 more

We develop Polaris, the first safety-focused LLM constellation for real-time patient-AI healthcare conversations. Unlike prior works in focusing on tasks like question answering, our work specifically focuses long multi-turn voice Our one-trillion parameter system is composed of several multibillion LLMs as co-operative agents: a stateful primary agent that driving an engaging conversation and specialist support agents focused performed by nurses to increase safety reduce hallucinations....

10.48550/arxiv.2403.13313 preprint EN arXiv (Cornell University) 2024-03-20

PatientExploreR: an extensible application for dynamic visualization of patient clinical history from electronic health records in the OMOP common data model

OPENALEX - Publications

Benjamin S. Glicksberg Boris Oskotsky Phyllis Thangaraj Nicholas Giangreco Marcus A. Badgeley and 17 more

Electronic health records (EHRs) are quickly becoming omnipresent in healthcare, but interoperability issues and technical demands limit their use for biomedical clinical research. Interactive flexible software that interfaces directly with EHR data structured around a common model (CDM) could accelerate more EHR-based research by making the accessible to researchers who lack computational expertise and/or domain knowledge.

10.1093/bioinformatics/btz409 article EN cc-by Bioinformatics 2019-06-14

From Theory to Practice – Assessing Translation of Physical Fitness Research in the Emergency Department through Machine Learning and Natural Language Processing

OPENALEX - Publications

Kristin Morrow Debajyoti Datta Lindsey Spiegelman Roy Almog Kai Zheng and 2 more

10.1017/cts.2025.10051 article EN cc-by-nc-nd Journal of Clinical and Translational Science 2025-05-21

Design of a Culturally-Informed Virtual Human for Educating Hispanic Women about Cervical Cancer

OPENALEX - Publications

Sanjana Mendu Mehdi Boukhechba Janna R. Gordon Debajyoti Datta Edwin Molina and 4 more

Significant health disparities exist between Hispanics and the general US population, complicated in part by communication, literacy, linguistic factors. There are few available Spanish-language interactive, technology-driven education programs that engage patients who have a range of literacy levels. We describe development an interactive virtual patient educator for educating counseling Hispanic women about cervical cancer human papillomavirus. Specifically, we iterative design methodology...

10.1145/3240925.3240968 article EN 2018-05-21

Influenza-like symptom recognition using mobile sensing and graph neural networks

OPENALEX - Publications

Guimin Dong Lihua Cai Debajyoti Datta Shashwat Kumar Laura E. Barnes and 1 more

Early detection of influenza-like symptoms can prevent widespread flu viruses and enable timely treatments, particularly in the post-pandemic era. Mobile sensing leverages an increasingly diverse set embedded sensors to capture fine-grained information human behaviors ambient contexts, serve as a promising solution for symptom recognition. Traditionally, handcrafted high level features mobile data are extracted by manual feature engineering convolutional/recurrent neural network...

10.1145/3450439.3451880 article EN 2021-03-23

ROMOP: a light-weight R package for interfacing with OMOP-formatted electronic health record data

OPENALEX - Publications

Benjamin S. Glicksberg Boris Oskotsky Nicholas Giangreco Phyllis Thangaraj Vivek A. Rudrapatna and 6 more

Electronic health record (EHR) data are increasingly used for biomedical discoveries. The nature of the data, however, requires expertise in both science and EHR structure. Observational Medical Out-comes Partnership (OMOP) common model (CDM) standardizes language structure to promote interoperability research. While OMOP CDM is valuable more attuned research purposes, it still extensive domain knowledge utilize effectively, potentially limiting widespread adoption quality improvement.

10.1093/jamiaopen/ooy059 article EN cc-by JAMIA Open 2019-01-04

BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing

OPENALEX - Publications

Jason Fries Leon Weber Natasha Seelam Gabriel Altay Debajyoti Datta and 38 more

Training and evaluating language models increasingly requires the construction of meta-datasets --diverse collections curated data with clear provenance. Natural prompting has recently lead to improved zero-shot generalization by transforming existing, supervised datasets into a diversity novel pretraining tasks, highlighting benefits meta-dataset curation. While successful in general-domain text, translating these data-centric approaches biomedical modeling remains challenging, as labeled...

10.48550/arxiv.2206.15076 preprint EN cc-by-sa arXiv (Cornell University) 2022-01-01

Leveraging Mobile Sensing and Bayesian Change Point Analysis to Monitor Community-scale Behavioral Interventions: A Case Study on COVID-19

OPENALEX - Publications

Shashwat Kumar Debajyoti Datta Guimin Dong Lihua Cai Mehdi Boukhechba and 1 more

During pandemics, effective interventions require monitoring the problem at different scales and understanding various tradeoffs between efficacy, privacy, economic burden. To address these challenges, we propose a framework where perform Bayesian change-point analysis on aggregate behavior markers extracted from mobile sensing data collected during COVID-19 pandemic. Results generated by 598 participants for up to four months reveal rich insights: We observe an increase in smartphone usage...

10.1145/3524886 article EN ACM Transactions on Computing for Healthcare 2022-07-20

Improving Classification through Weak Supervision in Context-specific Conversational Agent Development for Teacher Education

OPENALEX - Publications

Debajyoti Datta Maria Phillips Jennifer L. Chiu G. S. Watson James P. Bywater and 2 more

Machine learning techniques applied to the Natural Language Processing (NLP) component of conversational agent development show promising results for improved accuracy and quality feedback that a can provide. The effort required develop an educational scenario specific is time consuming as it requires domain experts label annotate noisy data sources such classroom videos. Previous approaches modeling annotations have relied on labeling thousands examples calculating inter-annotator agreement...

10.48550/arxiv.2010.12710 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Wearable Sensor-based Multimodal Physiological Responses of Socially Anxious Individuals across Social Contexts

OPENALEX - Publications

Emma R. Toner Mark Rucker Zhiyuan Wang Maria A. Larrazabal Lihua Cai and 6 more

Correctly identifying an individual's social context from passively worn sensors holds promise for delivering just-in-time adaptive interventions (JITAIs) to treat anxiety disorder. In this study, we present results using collected data a within-subject experiment that assessed physiological response across different contexts (i.e, alone vs. with others), phases (i.e., pre- and post-interaction during interaction), interaction sizes dyadic group interactions), levels of threat implicit...

10.48550/arxiv.2304.01293 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Dataset Debt in Biomedical Language Modeling

OPENALEX - Publications

Jason Fries Natasha Seelam Gabriel Altay Leon Weber Myungsun Kang and 7 more

Jason Fries, Natasha Seelam, Gabriel Altay, Leon Weber, Myungsun Kang, Debajyoti Datta, Ruisi Su, Samuele Garda, Bo Wang, Simon Ott, Matthias Samwald, Wojciech Kusa. Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022.

10.18653/v1/2022.bigscience-1.10 article EN cc-by 2022-01-01

Use of a Mobile Health Intervention by Older Versus Younger People with HIV: Analysis of Usage, Social Support, and Network Interactions

OPENALEX - Publications

Tabor Flickinger Breanna R. Campbell Allyson R Timm Sonia Baee Debajyoti Datta and 3 more

People with HIV in the United States are aging, risk for negative health outcomes from social isolation. PositiveLinks is a mobile (mHealth) intervention that includes an anonymous Community Message Board (CMB) peer-to-peer conversations. We investigated differences CMB usage and support between younger (<50 years) older (≥50) members.We assessed relationship age groups app use using chi-square tests. posts were analyzed qualitatively to categorize forms of support. To have visual...

10.1089/tmr.2022.0035 article EN cc-by Telemedicine Reports 2022-11-01

AI Predicts Early Relapse Post-Axicabtagene Ciloleucel in Diffuse Large B-Cell Lymphoma Patients in a Multi-Center Real-World Study

OPENALEX - Publications

Michelle Wang Krishna V. Komanduri Debajyoti Datta Ayan R. Patel Barbee I. Whitaker and 7 more

10.1182/blood-2024-210745 article EN Blood 2024-11-05

Abstract LB-006: Oncology model fidelity scores

OPENALEX - Publications

Debajyoti Datta Theodore C. Goldstein Zhiping Gu Atul J. Butte

Abstract Animal models remain a cornerstone of research efforts in Oncology to model the complexity cancer progression and discover new therapeutic approaches disease management. With advent genomic manipulation techniques such as CRISPR, advances mouse modeling with genetically engineered mice (GEM) patient-derived xenografts (PDX), we can expect development novel powerful animal near future. These are expanding our capability for pre-clinical testing agents or n-of-1 patient-specific...

10.1158/1538-7445.am2017-lb-006 article EN Cancer Research 2017-07-01

Geometry matters: Exploring language examples at the decision boundary

OPENALEX - Publications

Debajyoti Datta Shashwat Kumar Laura E. Barnes Tom Fletcher

A growing body of recent evidence has highlighted the limitations natural language processing (NLP) datasets and classifiers. These include presence annotation artifacts in datasets, classifiers relying on shallow features like a single word (e.g., if movie review "romantic", tends to be positive), or unnecessary words learning proper noun classify as positive negative). The such subsequently led development challenging force model generalize better. While variety heuristic strategies,...

10.48550/arxiv.2010.07212 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Detection and Analysis of Interrupted Behaviors by Public Policy Interventions during COVID-19

OPENALEX - Publications

Guimin Dong Lihua Cai Shashwat Kumar Debajyoti Datta Laura E. Barnes and 1 more

In most countries around the world, various public policies and guidelines, such as social distancing stay-at-home orders, have been put in place to slow down spreading of COVID-19. Relying on traditional surveys assess policy impacts community level behavior changes may lead biased results, limit fine-grained understanding human dynamics over time. We propose leverage mobile sensing capture people's footprints amid COVID-19 pandemic, understand their collective with respect existing...

10.1109/chase52844.2021.00013 article EN 2021-12-01

Coming Soon ...