- Natural Language Processing Techniques
- Speech and dialogue systems
- Topic Modeling
- Speech Recognition and Synthesis
- scientometrics and bibliometrics research
- Language and cultural evolution
- Meta-analysis and systematic reviews
- Language Development and Disorders
- Scientific Computing and Data Management
- Opinion Dynamics and Social Influence
- Intelligent Tutoring Systems and Adaptive Learning
- Digital Communication and Language
- Philosophy and History of Science
- Advanced Text Analysis Techniques
- Reading and Literacy Development
- Neurobiology of Language and Bilingualism
- Music and Audio Processing
- Language, Discourse, Communication Strategies
Stanford University
2018-2021
Creating the Babel Fish, a tool that helps individuals translate speech between any two languages, requires advanced technological innovation and linguistic expertise. Although conventional speech-to-speech translation systems composed of multiple subsystems performing in cascaded fashion exist1–3, scalable high-performing unified systems4,5 remain underexplored. To address this gap, here we introduce SEAMLESSM4T–Massively Multilingual Multimodal Machine Translation–a single model supports...
For any scientific report, repeating the original analyses upon data should yield outcomes. We evaluated analytic reproducibility in 25 Psychological Science articles awarded open badges between 2014 and 2015. Initially, 16 (64%, 95% confidence interval [43,81]) contained at least one 'major numerical discrepancy' (>10% difference) prompting us to request input from authors. Ultimately, target values were reproducible without author involvement for 9 (36% [20,59]) articles; with 6 (24%...
Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models enable end-to-end expressive and multilingual translations in streaming fashion. First, contribute an improved version the massively multimodal SeamlessM4T model-SeamlessM4T v2. This newer model, incorporating updated UnitY2 framework, was trained on more low-resource language...
Expressive speech-to-speech translation (S2ST) aims to transfer prosodic attributes of source speech target while maintaining accuracy. Existing research in expressive S2ST is limited, typically focusing on a single expressivity aspect at time. Likewise, this area lacks standard evaluation protocols and well-curated benchmark datasets. In work, we propose holistic cascade system for S2ST, combining multiple prosody techniques previously considered only isolation. We curate test set the TV...
What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech yet achieve similar strides. More specifically, conventional systems rely on cascaded perform progressively, putting high-performing out of reach. To address these gaps, we introduce SeamlessM4T, single model supports translation,...
In conversation, individual utterances are almost always ambiguous, with this ambiguity resolved by context and discourse history (common ground). One important cue for disambiguation is the topic under discussion a particular partner (e.g., "want to pick?" means something different in conversation bluegrass musician vs. book club partner). Here, we investigated 2- 5-year-old American English-speaking children's (N = 131) reliance on conversational topics specific partners interpret...
Abstract Despite their diversity, languages around the world share a consistent set of properties and distributional regularities. For example, distribution word frequencies, syntactic dependency lengths, presence ambiguity are all remarkably across languages. We discuss framework for studying how these system‐level emerge from local, in‐the‐moment interactions rational, pragmatic speakers listeners. To do so, we derive novel objective function measuring communicative efficiency linguistic...
In conversation, individual utterances are almost always ambiguous, with this ambiguity resolved by context and discourse history (common ground). One important cue for disambiguation is the topic under discussion a particular partner (e.g., “want to pick?” means something different in conversation bluegrass musician vs. book club partner). Here, we investigated 2- 5-year-old American English-speaking children’s (N = 131) reliance on conversational topics specific partners interpret...
In this paper, we propose a textless acoustic model with self-supervised distillation strategy for noise-robust expressive speech-to-speech translation (S2ST). Recently proposed S2ST systems have achieved impressive expressivity preservation performances by cascading unit-to-speech (U2S) generator to the speech-to-unit model. However, these are vulnerable presence of noise in input speech, which is an assumption real-world scenarios. To address limitation, U2S that incorporates no label...
Despite their diversity, languages around the world share a consistent set of properties and distributional regularities. For example, distribution word frequencies, syntactic dependency lengths, presence ambigu- ity are all remarkably across languages. We dis- cuss framework for studying how these system-level proper- ties emerge from local, in-the-moment interactions rational, pragmatic speakers listeners. To do so, we derive novel objective function measuring communicative efficiency...
Expressive speech-to-speech translation (S2ST) aims to transfer prosodic attributes of source speech target while maintaining accuracy. Existing research in expressive S2ST is limited, typically focusing on a single expressivity aspect at time. Likewise, this area lacks standard evaluation protocols and well-curated benchmark datasets. In work, we propose holistic cascade system for S2ST, combining multiple prosody techniques previously considered only isolation. We curate test set the TV...