NFDI4DS | UHH-SEMS - Publication Details

Proyag Pal

ORCID: 0000-0003-2003-3689

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5053147986

Research Areas

Natural Language Processing Techniques
Topic Modeling
Machine Learning in Bioinformatics
Text and Document Classification Technologies
Speech and dialogue systems
Multimodal Machine Learning Applications

University of Edinburgh
2022-2023

FINDINGS OF THE IWSLT 2023 EVALUATION CAMPAIGN

OPENALEX - Publications

Milind Agarwal Sweta Agrawal Antonios Anastasopoulos Luisa Bentivogli Ondřej Bojar and 57 more

Milind Agarwal, Sweta Agrawal, Antonios Anastasopoulos, Luisa Bentivogli, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, Mingda Chen, William Khalid Choukri, Alexandra Chronopoulou, Anna Currey, Thierry Declerck, Qianqian Dong, Kevin Duh, Yannick Estève, Marcello Federico, Souhir Gahbiche, Barry Haddow, Benjamin Hsu, Phu Mon Htut, Hirofumi Inaguma, Dávid Javorský, John Judge, Yasumasa Kano, Tom Ko, Rishu Kumar, Pengwei Li, Xutai Ma, Prashant Mathur, Evgeny...

10.18653/v1/2023.iwslt-1.1 article EN cc-by 2023-01-01

Document-Level Machine Translation with Large-Scale Public Parallel Corpora

OPENALEX - Publications

Proyag Pal Alexandra Birch Kenneth Heafield

10.18653/v1/2024.acl-long.712 article EN 2024-01-01

Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters

OPENALEX - Publications

Proyag Pal Brian J. Thompson Yogesh Virkar Prashant Mathur Alexandra Chronopoulou and 1 more

10.21437/interspeech.2023-1063 article EN Interspeech 2022 2023-08-14

Cheat Codes to Quantify Missing Source Information in Neural Machine Translation

OPENALEX - Publications

Proyag Pal Kenneth Heafield

This paper describes a method to quantify the amount of information H(t|s) added by target sentence t that is not present in source s neural machine translation system. We do this providing model highly compressed form (a “cheat code”), and exploring effect size cheat code. find able capture extra from just single float representation nearly reproduces with two 32-bit floats per token.

10.18653/v1/2022.naacl-main.177 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2022-01-01

Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters

OPENALEX - Publications

Proyag Pal Brian J. Thompson Yogesh Virkar Prashant Mathur Alexandra Chronopoulou and 1 more

To translate speech for automatic dubbing, machine translation needs to be isochronous, i.e. translated aligned with the source in terms of durations. We introduce target factors a transformer model predict durations jointly language phoneme sequences. also auxiliary counters help decoder keep track timing information while generating phonemes. show that our improves quality and isochrony compared previous work where is instead trained interleaved sequences phonemes

10.48550/arxiv.2305.13204 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Cheating to Identify Hard Problems for Neural Machine Translation

OPENALEX - Publications

Proyag Pal Kenneth Heafield

We identify hard problems for neural machine translation models by analyzing progressively higher-scoring translations generated letting cheat to various degrees. If a system cheats and still gets something wrong, that suggests it is problem. experiment with two forms of cheating: providing the model compressed representation target as an additional input, fine-tuning on test set. Contrary popular belief, we find most frequent tokens are not necessarily accurately translated due these often...

10.18653/v1/2023.findings-eacl.120 article EN cc-by 2023-01-01

Coming Soon ...