Proyag Pal

ORCID: 0000-0003-2003-3689
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Machine Learning in Bioinformatics
  • Text and Document Classification Technologies
  • Speech and dialogue systems
  • Multimodal Machine Learning Applications

University of Edinburgh
2022-2023

Milind Agarwal, Sweta Agrawal, Antonios Anastasopoulos, Luisa Bentivogli, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, Mingda Chen, William Khalid Choukri, Alexandra Chronopoulou, Anna Currey, Thierry Declerck, Qianqian Dong, Kevin Duh, Yannick Estève, Marcello Federico, Souhir Gahbiche, Barry Haddow, Benjamin Hsu, Phu Mon Htut, Hirofumi Inaguma, Dávid Javorský, John Judge, Yasumasa Kano, Tom Ko, Rishu Kumar, Pengwei Li, Xutai Ma, Prashant Mathur, Evgeny...

10.18653/v1/2023.iwslt-1.1 article EN cc-by 2023-01-01

This paper describes a method to quantify the amount of information H(t|s) added by target sentence t that is not present in source s neural machine translation system. We do this providing model highly compressed form (a “cheat code”), and exploring effect size cheat code. find able capture extra from just single float representation nearly reproduces with two 32-bit floats per token.

10.18653/v1/2022.naacl-main.177 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2022-01-01

To translate speech for automatic dubbing, machine translation needs to be isochronous, i.e. translated aligned with the source in terms of durations. We introduce target factors a transformer model predict durations jointly language phoneme sequences. also auxiliary counters help decoder keep track timing information while generating phonemes. show that our improves quality and isochrony compared previous work where is instead trained interleaved sequences phonemes

10.48550/arxiv.2305.13204 preprint EN other-oa arXiv (Cornell University) 2023-01-01

We identify hard problems for neural machine translation models by analyzing progressively higher-scoring translations generated letting cheat to various degrees. If a system cheats and still gets something wrong, that suggests it is problem. experiment with two forms of cheating: providing the model compressed representation target as an additional input, fine-tuning on test set. Contrary popular belief, we find most frequent tokens are not necessarily accurately translated due these often...

10.18653/v1/2023.findings-eacl.120 article EN cc-by 2023-01-01
Coming Soon ...