- Natural Language Processing Techniques
- Topic Modeling
- Speech Recognition and Synthesis
- Speech and dialogue systems
- Multimodal Machine Learning Applications
- Language and cultural evolution
- Text Readability and Simplification
- Music and Audio Processing
- Phonetics and Phonology Research
- Semantic Web and Ontologies
- Biomedical Text Mining and Ontologies
- Machine Learning and Data Classification
- Data Quality and Management
- Authorship Attribution and Profiling
- Web Data Mining and Analysis
- Speech and Audio Processing
- Seismology and Earthquake Studies
- Digital Humanities and Scholarship
- Hand Gesture Recognition Systems
- GNSS positioning and interference
- Subtitles and Audiovisual Media
- Linguistic Variation and Morphology
- Hearing Impairment and Communication
- Advanced Data Processing Techniques
- Domain Adaptation and Few-Shot Learning
Johns Hopkins University
2020-2023
Microsoft (United States)
2023
University of Copenhagen
2020-2023
Charles University
2023
The University of Melbourne
2020-2022
Carnegie Mellon University
2013-2022
Carleton University
2022
Georgetown University
2022
Yale University
2022
University of Florida
2022
Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John...
Milind Agarwal, Sweta Agrawal, Antonios Anastasopoulos, Luisa Bentivogli, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, Mingda Chen, William Khalid Choukri, Alexandra Chronopoulou, Anna Currey, Thierry Declerck, Qianqian Dong, Kevin Duh, Yannick Estève, Marcello Federico, Souhir Gahbiche, Barry Haddow, Benjamin Hsu, Phu Mon Htut, Hirofumi Inaguma, Dávid Javorský, John Judge, Yasumasa Kano, Tom Ko, Rishu Kumar, Pengwei Li, Xutai Ma, Prashant Mathur, Evgeny...
Ebrahim Ansari, Amittai Axelrod, Nguyen Bach, Ondřej Bojar, Roldano Cattoni, Fahim Dalvi, Nadir Durrani, Marcello Federico, Christian Federmann, Jiatao Gu, Fei Huang, Kevin Knight, Xutai Ma, Ajay Nagesh, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Xing Shi, Sebastian Stüker, Marco Turchi, Alexander Waibel, Changhan Wang. Proceedings of the 17th International Conference on Spoken Language Translation. 2020.
We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and translation (ST) research across many non-English source languages.The corpus is a collection of audio recordings from talks in 8 languages.We segment transcripts into sentences align them sourcelanguage target-language translations.The released along with open-sourced code enabling extension new languages as they become available.Our creation methodology can be applied more than previous work, creates...
Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, Edoardo Maria Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff, Ryan Cotterell, Miikka Silfverberg, Mans Hulden. Proceedings of the...
Antonios Anastasopoulos, Ondřej Bojar, Jacob Bremerman, Roldano Cattoni, Maha Elbayad, Marcello Federico, Xutai Ma, Satoshi Nakamura, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Alexander Waibel, Changhan Wang, Matthew Wiesner. Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021). 2021.
What are the units of text that we want to model? From bytes multi-word expressions, can be analyzed and generated at many granularities. Until recently, most natural language processing (NLP) models operated over words, treating those as discrete atomic tokens, but starting with byte-pair encoding (BPE), subword-based approaches have become dominant in areas, enabling small vocabularies while still allowing for fast inference. Is end road character-level model or byte-level processing? In...
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. comprises two major thrusts: language-independent feature schema rich annotation and type-level resource annotated data in languages realizing that schema. This paper presents the expansions improvements made on several fronts over last couple years (since McCarthy et al. (2020)). Collaborative efforts...
Previous work on end-to-end translation from speech has primarily used frame-level features as representations, which creates longer, sparser sequences than text. We show that a naive method to create compressed phoneme-like representations is far more effective and efficient for traditional features. Specifically, we generate phoneme labels frames average consecutive with the same label shorter, higher-level source translation. see improvements of up 5 BLEU both our high low resource...
Prior work has explored directly regularizing the output distributions of probabilistic models to alleviate peaky (i.e. over-confident) predictions, a common sign overfitting. This class techniques, which label smoothing is one, connection entropy regularization. Despite consistent success across architectures and data sets in language generation tasks, two problems remain open: (1) there little understanding underlying effects regularizers have on models, (2) full space regularization...
This paper reports on the shared tasks organized by 21st IWSLT Conference. The address 7 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling dubbing, speech-to-speech dialect low-resource speech Indic languages. attracted 18 teams whose submissions are documented 26 system papers. growing interest towards translation is also witnessed constantly increasing number of task organizers contributors to overview paper, almost evenly...
Machine translation models have discrete vocabularies and commonly use subword segmentation techniques to achieve an ‘open vocabulary.’ This approach relies on consistent correct underlying unicode sequences, makes susceptible degradation from common types of noise variation. Motivated by the robustness human language processing, we propose visual text representations, which dispense with a finite set embeddings in favor continuous created processing visually rendered sliding windows. We...
As large language models (LLM) become more and capable in languages other than English, it is important to collect benchmark datasets order evaluate their multilingual performance, including on tasks like machine translation (MT). In this work, we extend the WMT24 dataset cover 55 by collecting new human-written references post-edits for 46 dialects addition of 8 out 9 original dataset. The covers four domains: literary, news, social, speech. We a variety MT providers LLMs collected using...
End-to-end models for speech translation (ST) more tightly couple recognition (ASR) and machine (MT) than a traditional cascade of separate ASR MT models, with simpler model architectures the potential reduced error propagation. Their performance is often assumed to be superior, though in many conditions this not yet case. We compare cascaded end-to-end across high, medium, low-resource conditions, show that cascades remain stronger baselines. Further, we introduce two methods incorporate...
Transformer models are powerful sequence-to-sequence architectures that capable of directly mapping speech inputs to transcriptions or translations.However, the mechanism for modeling positions in this model was tailored text modeling, and thus is less ideal acoustic inputs.In work, we adapt relative position encoding scheme Speech Transformer, where key addition distance between input states self-attention network.As a result, network can better variable distributions present data.Our...
Language models are defined over a finite set of inputs, which creates vocabulary bottleneck when we attempt to scale the number supported languages. Tackling this results in trade-off between what can be represented embedding matrix and computational issues output layer. This paper introduces PIXEL, Pixel-based Encoder Language, suffers from neither these issues. PIXEL is pretrained language model that renders text as images, making it possible transfer representations across languages...
Elizabeth Salesky, Matthias Sperber, Alexander Waibel. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
When translating from speech, special consideration for conversational speech phenomena such as disfluencies is necessary. Most machine translation training data consists of well-formed written texts, causing issues when spontaneous speech. Previous work has introduced an intermediate step between recognition (ASR) and (MT) to remove disfluencies, making the better-matched typical text significantly improving performance. However, with rise end-to-end systems, this must be incorporated into...
While there exist scores of natural languages, each with its unique features and idiosyncrasies, they all share a unifying theme: enabling human communication. We may thus reasonably predict that cognition shapes how these languages evolve are used. Assuming the capacity to process information is roughly constant across populations, we expect surprisal–duration trade-off arise both within languages. analyse this using corpus 600 and, after controlling for several potential confounds, find...
Johannes Bjerva, Elizabeth Salesky, Sabrina J. Mielke, Aditi Chaudhary, Giuseppe G. A. Celano, Edoardo Maria Ponti, Ekaterina Vylomova, Ryan Cotterell, Isabelle Augenstein. Proceedings of the Second Workshop on Computational Research in Linguistic Typology. 2020.
Elizabeth Salesky, Eleanor Chodroff, Tiago Pimentel, Matthew Wiesner, Ryan Cotterell, Alan W Black, Jason Eisner. Proceedings of the 58th Annual Meeting Association for Computational Linguistics. 2020.