- Topic Modeling
- Speech and dialogue systems
- AI in Service Interactions
- Fault Detection and Control Systems
- Authorship Attribution and Profiling
- ICT in Developing Communities
- Mobile Learning in Education
- Speech and Audio Processing
- Online Learning and Analytics
- Speech Recognition and Synthesis
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Retinal Imaging and Analysis
- Personal Information Management and User Behavior
- Video Analysis and Summarization
- Music and Audio Processing
- Media Influence and Health
- Advanced Statistical Methods and Models
- Control Systems and Identification
- Sports Performance and Training
- Sports Dynamics and Biomechanics
- Scientific Computing and Data Management
- Human Pose and Action Recognition
- Sentiment Analysis and Opinion Mining
- Statistical Methods and Inference
Johns Hopkins University Applied Physics Laboratory
2024
Columbia University
2023
Cornell University
2020-2021
Carnegie Mellon University
2021
Sandia National Laboratories
2015-2018
Sandia National Laboratories California
2017-2018
Collecting high quality conversational data can be very expensive for most applications and infeasible others due to privacy, ethical, or similar concerns. A promising direction tackle this problem is generate synthetic dialogues by prompting large language models. In work, we use a small set of expert-written conversations as in-context examples synthesize social conversation dataset using prompting. We perform several thorough evaluations our compared human-collected conversations. This...
School closures due to teacher strikes or political unrest in low-resource contexts can adversely affect children’s educational outcomes and career opportunities. Phone-based technologies could help bridge these gaps formal schooling, but it is unclear whether how children their families will use such systems during periods of disruption. We investigate two mobile learning deployed sub-Saharan Africa: a text-message-based application with lessons quizzes adhering the national curriculum...
Mobile learning is expanding rapidly due to its accessibility and affordability, especially in resource-poor parts of the world. Yet how students engage learn with mobile has not been systematically analyzed at scale. This study examines 93,819 Kenyan grades 6, 9, 12 use a text message-based platform that millions users across Sub-Saharan Africa. We investigate longitudinal variation engagement over one-year period for different age groups check evidence gains using curve analysis. Student...
Mixed-initiative dialogue tasks involve repeated exchanges of information and conversational control. Conversational agents gain control by generating responses that follow particular intents or strategies, prescribed a policy planner. The standard approach has been fine-tuning pre-trained language models to perform generation conditioned on these intents. However, supervised are limited the cost quality data annotation.We instead prompt large as drop-in replacement conditional generation....
We collected marathon performance data from a systematic sample of elite and sub-elite athletes over the period 2015 to 2019, then searched internet for publicly-available photographs these performances, identifying whether Nike Vaporfly shoes were worn or not in each performance. Controlling athlete ability race difficulty, we estimated effect on times wearing shoes. Assuming that is additive, estimate improve men's between 2.0 3.9 minutes, while they women's 0.8 3.5 minutes....
Dialogue understanding tasks often necessitate abundant annotated data to achieve good performance and that presents challenges in low-resource settings. To alleviate this barrier, we explore few-shot augmentation for dialogue by prompting large pre-trained language models present a novel approach iterates on quality applying weakly-supervised filters. We evaluate our methods the emotion act classification DailyDialog intent task Facebook Multilingual Task-Oriented Dialogue. Models...
Planning for goal-oriented dialogue often requires simulating future interactions and estimating task progress. Many approaches thus consider training neural networks to perform look-ahead search algorithms such as A* Monte Carlo Tree Search (MCTS). However, this require abundant annotated data, which creates challenges when faced with noisy annotations or low-resource settings. We introduce GDP-Zero, an approach using Open-Loop MCTS policy planning without any model training. GDP-Zero...
Complex conversation settings such as persuasion involve communicating changes in attitude or behavior, so users' perspectives need to be addressed, even when not directly related the topic. In this work, we contribute a novel modular dialogue system framework that seamlessly integrates factual information and social content into persuasive dialogue. Our is generalizable any tasks have mixed task contents. We conducted study compared user evaluations of our versus baseline end-to-end...
Conversational assistants are increasingly popular across diverse real-world applications, highlighting the need for advanced multimodal speech modeling. Speech, as a natural mode of communication, encodes rich user-specific characteristics such speaking rate and pitch, making it critical effective interaction. Our work introduces data-centric customization approach efficiently enhancing understanding in conversational Central to our contributions is novel multi-task learning paradigm that...
Mixed-initiative dialogue tasks involve repeated exchanges of information and conversational control. Conversational agents gain control by generating responses that follow particular intents or strategies, prescribed a policy planner. The standard approach has been fine-tuning pre-trained language models to perform generation conditioned on these intents. However, supervised are limited the cost quality data annotation. We instead prompt large as drop-in replacement conditional generation....
Planning for goal-oriented dialogue often requires simulating future interactions and estimating task progress. Many approaches thus consider training neural networks to perform look-ahead search algorithms such as A* Monte Carlo Tree Search (MCTS). However, this abundant annotated data, which creates challenges when faced with noisy annotations or low-resource settings. We introduce GDP-Zero, an approach using Open-Loop MCTS policy planning without any model training. GDP-Zero prompts a...
Syntax is a fundamental component of language, yet few metrics have been employed to capture syntactic similarity or coherence at the utterance- and document-level. The existing standard document-level metric computationally expensive performs inconsistently when faced with syntactically dissimilar documents. To address these challenges, we present FastKASSIM, for which pairs averages most similar constituency parse trees between pair documents based on tree kernels. FastKASSIM more robust...
With the current growing availability of datasets coming from multiple sources and domains, systems onboard our military assets have an immediate need being functional in handling large amounts data, implementing fast appropriate analyses for these datasets. However, often very limited computational resources upon which to process tasks. Generalized additive models (GAMs), are statistical model that better able account non-linear relationships between independent dependent variables, been...
Syntax is a fundamental component of language, yet few metrics have been employed to capture syntactic similarity or coherence at the utterance- and document-level. The existing standard document-level metric computationally expensive performs inconsistently when faced with syntactically dissimilar documents. To address these challenges, we present FastKASSIM, for which pairs averages most similar constituency parse trees between pair documents based on tree kernels. FastKASSIM more robust...
Collecting high quality conversational data can be very expensive for most applications and infeasible others due to privacy, ethical, or similar concerns. A promising direction tackle this problem is generate synthetic dialogues by prompting large language models. In work, we use a small set of expert-written conversations as in-context examples synthesize social conversation dataset using prompting. We perform several thorough evaluations our compared human-collected conversations. This...
Speech models have long been known to overfit individual speakers for many classification tasks. This leads poor generalization in settings where the are out-of-domain or out-of-distribution, as is common production environments. We view speaker adaptation a few-shot learning problem and propose investigating transfer approaches inspired by recent success with pre-trained natural language pre-finetuning speech on difficult tasks distill knowledge into downstream objectives. pre-finetune...
Speech models have long been known to overfit individual speakers for many classification tasks.This leads poor generalization in settings where the are out-of-domain or out-of-distribution, as is common production environments.We view speaker adaptation a few-shot learning problem and propose investigating transfer approaches inspired by recent success with pre-trained natural language tasks.We pre-finetuning speech on difficult tasks distill knowledge into downstream objectives.We...