- Natural Language Processing Techniques
- Topic Modeling
- Machine Learning in Bioinformatics
- Text and Document Classification Technologies
- Speech and dialogue systems
- Multimodal Machine Learning Applications
University of Edinburgh
2022-2023
Milind Agarwal, Sweta Agrawal, Antonios Anastasopoulos, Luisa Bentivogli, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, Mingda Chen, William Khalid Choukri, Alexandra Chronopoulou, Anna Currey, Thierry Declerck, Qianqian Dong, Kevin Duh, Yannick Estève, Marcello Federico, Souhir Gahbiche, Barry Haddow, Benjamin Hsu, Phu Mon Htut, Hirofumi Inaguma, Dávid Javorský, John Judge, Yasumasa Kano, Tom Ko, Rishu Kumar, Pengwei Li, Xutai Ma, Prashant Mathur, Evgeny...
This paper describes a method to quantify the amount of information H(t|s) added by target sentence t that is not present in source s neural machine translation system. We do this providing model highly compressed form (a “cheat code”), and exploring effect size cheat code. find able capture extra from just single float representation nearly reproduces with two 32-bit floats per token.
To translate speech for automatic dubbing, machine translation needs to be isochronous, i.e. translated aligned with the source in terms of durations. We introduce target factors a transformer model predict durations jointly language phoneme sequences. also auxiliary counters help decoder keep track timing information while generating phonemes. show that our improves quality and isochrony compared previous work where is instead trained interleaved sequences phonemes
We identify hard problems for neural machine translation models by analyzing progressively higher-scoring translations generated letting cheat to various degrees. If a system cheats and still gets something wrong, that suggests it is problem. experiment with two forms of cheating: providing the model compressed representation target as an additional input, fine-tuning on test set. Contrary popular belief, we find most frequent tokens are not necessarily accurately translated due these often...