NFDI4DS | UHH-SEMS - Publication Details

Joint speech and text machine translation for up to 100 languages

OPENALEX - Publications

Loïc Barrault Yu-An Chung Mariano Coria Meglioli David Dale Ning Dong and 62 more

Creating the Babel Fish, a tool that helps individuals translate speech between any two languages, requires advanced technological innovation and linguistic expertise. Although conventional speech-to-speech translation systems composed of multiple subsystems performing in cascaded fashion exist1–3, scalable high-performing unified systems4,5 remain underexplored. To address this gap, here we introduce SEAMLESSM4T–Massively Multilingual Multimodal Machine Translation–a single model supports...

10.1038/s41586-024-08359-z article EN cc-by-nc-nd Nature 2025-01-15

Analytic reproducibility in articles receiving open data badges at the journal Psychological Science : an observational study

OPENALEX - Publications

Tom E Hardwicke Manuel Bohn Kyle MacDonald Emily Hembacher Michèle B. Nuijten and 5 more

For any scientific report, repeating the original analyses upon data should yield outcomes. We evaluated analytic reproducibility in 25 Psychological Science articles awarded open badges between 2014 and 2015. Initially, 16 (64%, 95% confidence interval [43,81]) contained at least one 'major numerical discrepancy' (>10% difference) prompting us to request input from authors. Ultimately, target values were reproducible without author involvement for 9 (36% [20,59]) articles; with 6 (24%...

10.1098/rsos.201494 article EN cc-by Royal Society Open Science 2021-01-01

Seamless: Multilingual Expressive and Streaming Speech Translation

OPENALEX - Publications

Seamless Communication Loïc Barrault Yu-An Chung Mariano Coria Meglioli David C. Dale and 60 more

Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models enable end-to-end expressive and multilingual translations in streaming fashion. First, contribute an improved version the massively multimodal SeamlessM4T model-SeamlessM4T v2. This newer model, incorporating updated UnitY2 framework, was trained on more low-resource language...

10.48550/arxiv.2312.05187 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

A Holistic Cascade System, Benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation

OPENALEX - Publications

Wen-Chin Huang Benjamin Peloquin Justine Kao Changhan Wang Hongyu Gong and 4 more

Expressive speech-to-speech translation (S2ST) aims to transfer prosodic attributes of source speech target while maintaining accuracy. Existing research in expressive S2ST is limited, typically focusing on a single expressivity aspect at time. Likewise, this area lacks standard evaluation protocols and well-curated benchmark datasets. In work, we propose holistic cascade system for S2ST, combining multiple prosody techniques previously considered only isolation. We curate test set the TV...

10.1109/icassp49357.2023.10096183 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

OPENALEX - Publications

Seamless Communication Loïc Barrault Yu-An Chung Mariano Cora Meglioli David C. Dale and 63 more

What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech yet achieve similar strides. More specifically, conventional systems rely on cascaded perform progressively, putting high-performing out of reach. To address these gaps, we introduce SeamlessM4T, single model supports translation,...

10.48550/arxiv.2308.11596 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

Children's interpretation of ambiguous pronouns based on prior discourse

OPENALEX - Publications

Manuel Bohn Khuyen Nha Le Benjamin Peloquin Bahar Köymen Michael C. Frank

In conversation, individual utterances are almost always ambiguous, with this ambiguity resolved by context and discourse history (common ground). One important cue for disambiguation is the topic under discussion a particular partner (e.g., "want to pick?" means something different in conversation bluegrass musician vs. book club partner). Here, we investigated 2- 5-year-old American English-speaking children's (N = 131) reliance on conversational topics specific partners interpret...

10.1111/desc.13049 article EN cc-by-nc Developmental Science 2020-10-16

The Interactions of Rational, Pragmatic Agents Lead to Efficient Language Structure and Use

OPENALEX - Publications

Benjamin Peloquin Noah D. Goodman Michael C. Frank

Abstract Despite their diversity, languages around the world share a consistent set of properties and distributional regularities. For example, distribution word frequencies, syntactic dependency lengths, presence ambiguity are all remarkably across languages. We discuss framework for studying how these system‐level emerge from local, in‐the‐moment interactions rational, pragmatic speakers listeners. To do so, we derive novel objective function measuring communicative efficiency linguistic...

10.1111/tops.12489 article EN publisher-specific-oa Topics in Cognitive Science 2020-01-01

Children’s interpretation of ambiguous pronouns based on prior discourse

OPENALEX - Publications

Manuel Bohn Khuyen Le Benjamin Peloquin Bahar Köymen Michael C. Frank

In conversation, individual utterances are almost always ambiguous, with this ambiguity resolved by context and discourse history (common ground). One important cue for disambiguation is the topic under discussion a particular partner (e.g., “want to pick?” means something different in conversation bluegrass musician vs. book club partner). Here, we investigated 2- 5-year-old American English-speaking children’s (N = 131) reliance on conversational topics specific partners interpret...

10.31234/osf.io/gkhez preprint EN 2020-02-21

Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation

OPENALEX - Publications

Min-Jae Hwang Ilia Kulikov Benjamin Peloquin Hongyu Gong Peng–Jen Chen and 1 more

In this paper, we propose a textless acoustic model with self-supervised distillation strategy for noise-robust expressive speech-to-speech translation (S2ST). Recently proposed S2ST systems have achieved impressive expressivity preservation performances by cascading unit-to-speech (U2S) generator to the speech-to-unit model. However, these are vulnerable presence of noise in input speech, which is an assumption real-world scenarios. To address limitation, U2S that incorporates no label...

10.48550/arxiv.2406.02733 preprint EN arXiv (Cornell University) 2024-06-04

Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation

OPENALEX - Publications

Min-Jae Hwang Ilia Kulikov Benjamin Peloquin Hongyu Gong Peng–Jen Chen and 1 more

10.18653/v1/2024.findings-acl.917 article EN Findings of the Association for Computational Linguistics: ACL 2022 2024-01-01

The interactions of rational, pragmatic agents lead to efficient language structure and use

OPENALEX - Publications

Benjamin Peloquin

Despite their diversity, languages around the world share a consistent set of properties and distributional regularities. For example, distribution word frequencies, syntactic dependency lengths, presence ambigu- ity are all remarkably across languages. We dis- cuss framework for studying how these system-level proper- ties emerge from local, in-the-moment interactions rational, pragmatic speakers listeners. To do so, we derive novel objective function measuring communicative efficiency...

10.31234/osf.io/8f9gv article EN 2019-02-03

A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation

OPENALEX - Publications

Wen-Chin Huang Benjamin Peloquin Justine Kao Changhan Wang Hongyu Gong and 4 more

Expressive speech-to-speech translation (S2ST) aims to transfer prosodic attributes of source speech target while maintaining accuracy. Existing research in expressive S2ST is limited, typically focusing on a single expressivity aspect at time. Likewise, this area lacks standard evaluation protocols and well-curated benchmark datasets. In work, we propose holistic cascade system for S2ST, combining multiple prosody techniques previously considered only isolation. We curate test set the TV...

10.48550/arxiv.2301.10606 preprint EN other-oa arXiv (Cornell University) 2023-01-01