Ivana Kvapilíková

ORCID: 0000-0003-1479-3294
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Text Readability and Simplification
  • Multimodal Machine Learning Applications
  • Banking stability, regulation, efficiency
  • Speech Recognition and Synthesis
  • Market Dynamics and Volatility
  • Financial Risk and Volatility Modeling
  • Translation Studies and Practices

Charles University
2019-2020

Prague University of Economics and Business
2017

10.1016/j.najef.2017.08.007 article EN The North American Journal of Economics and Finance 2017-08-24

Existing models of multilingual sentence embeddings require large parallel data resources which are not available for low-resource languages. We propose a novel unsupervised method to derive relying only on monolingual data. first produce synthetic corpus using machine translation, and use it fine-tune pretrained cross-lingual masked language model (XLM) the representations. The quality representations is evaluated two mining tasks with improvements up 22 F1 points over vanilla XLM. In...

10.18653/v1/2020.acl-srw.34 preprint EN cc-by 2020-01-01

In this paper we describe the CUNI translation system used for unsupervised news shared task of ACL 2019 Fourth Conference on Machine Translation (WMT19). We follow strategy Artetxe ae at. (2018b), creating a seed phrase-based where phrase table is initialized from cross-lingual embedding mappings trained monolingual data, followed by neural machine synthetic parallel data. The corpus was produced tuned PBMT model refined through iterative back-translation. further focus handling named...

10.18653/v1/w19-5323 article EN cc-by 2019-01-01

We present our submission to the WMT23 shared task in translation between English and Assamese, Khasi, Mizo Manipuri. All systems were pretrained on of multilingual masked language modelling denoising auto-encoding. Our primary for into further MT all four directions fine-tuned limited parallel data available each pair separately. used online back-translation augmentation. The same submitted as contrastive out pretraining step seemed harm performance. trained without step. Other additional...

10.18653/v1/2023.wmt-1.90 article EN cc-by 2023-01-01

This paper presents a description of CUNI systems submitted to the WMT20 task on unsupervised and very low-resource supervised machine translation between German Upper Sorbian. We experimented with training synthetic data pre-training related language pair. In fully scenario, we achieved 25.5 23.7 BLEU translating from into Sorbian, respectively. Our relied transfer learning German-Czech parallel 57.4 56.1 BLEU, which is an improvement 10 points over baseline trained only available small...

10.48550/arxiv.2010.11747 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Even with the latest developments in deep learning and large-scale language modeling, task of machine translation (MT) low-resource languages remains a challenge. Neural MT systems can be trained an unsupervised way without any resources but quality lags behind, especially truly conditions. We propose training strategy that relies on pseudo-parallel sentence pairs mined from monolingual corpora addition to synthetic back-translated corpora. experiment different schedules reach improvement up...

10.48550/arxiv.2310.14262 preprint EN other-oa arXiv (Cornell University) 2023-01-01

In this paper we describe the CUNI translation system used for unsupervised news shared task of ACL 2019 Fourth Conference on Machine Translation (WMT19). We follow strategy Artexte et al. (2018b), creating a seed phrase-based where phrase table is initialized from cross-lingual embedding mappings trained monolingual data, followed by neural machine synthetic parallel data. The corpus was produced tuned PBMT model refined through iterative back-translation. further focus handling named...

10.48550/arxiv.1907.12664 preprint EN other-oa arXiv (Cornell University) 2019-01-01
Coming Soon ...