- Natural Language Processing Techniques
- Topic Modeling
- Text Readability and Simplification
- Semantic Web and Ontologies
- Software Testing and Debugging Techniques
- Speech and dialogue systems
- Mathematics, Computing, and Information Processing
- Biomedical Text Mining and Ontologies
- Software Engineering Research
- Artificial Intelligence in Healthcare and Education
- Speech Recognition and Synthesis
- Handwritten Text Recognition Techniques
- Explainable Artificial Intelligence (XAI)
- Multimodal Machine Learning Applications
- Translation Studies and Practices
German Research Centre for Artificial Intelligence
2016-2023
Uppsala University
2017
Daniel Zeman, Martin Popel, Milan Straka, Jan Hajič, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Potthast, Francis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna Nedoluzhko, Silvie Cinková, Hajič jr., Jaroslava Hlaváčová, Václava Kettnerová, Zdeňka Urešová, Jenna Kanerva, Stina Ojala, Missilä, Christopher D. Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hiroshi...
Abstract In this paper, we report an analysis of the strengths and weaknesses several Machine Translation (MT) engines implementing three most widely used paradigms. The is based on a manually built test suite that comprises large range linguistic phenomena. Two main observations are one hand striking improvement commercial online system when turning from phrase-based to neural engine other successful translations MT systems sometimes bear resemblance with rule-based system.
Abstract In this article we present a novel linguistically driven evaluation method and apply it to the main approaches of Machine Translation (Rule-based, Phrase-based, Neural) gain insights into their strengths weaknesses in much more detail than provided by current schemes. Translating between two languages requires substantial modelling knowledge about languages, translation, world. Using English-German IT-domain translation as case-study, also enhance Phrase-based system exploiting...
This paper offers a fine-grained analysis of the machine translation outputs in context Shared Task at 8th Conference Machine Translation (WMT23). Building on foundation previous test suite efforts, our includes Large Language Models and an updated set featuring new linguistic phenomena. To knowledge, this is first for GPT-4 outputs. Our evaluation spans German-English, English-German, English-Russian language directions. Some phenomena with lowest accuracies German-English are idioms...
We present an analysis of 16 state-of-the-art MT systems on German-English based a linguistically-motivated test suite. The suite has been devised manually by team language professionals in order to cover broad variety linguistic phenomena that often fails translate properly. It contains 5,000 sentences covering 106 14 categories, with increased focus verb tenses, aspects and moods. outputs are evaluated semi-automatic way through regular expressions only the part sentence is relevant each...
We present the results of application a grammatical test suite for German-to-English MT on systems submitted at WMT19, with detailed analysis 107 phenomena organized in 14 categories. The still translate wrong one out four items average. Low performance is indicated idioms, modals, pseudo-clefts, multi-word expressions and verb valency. When compared to last year, there has been improvement function words, non verbal agreement punctuation. More conclusions about particular are also presented.
We present an alternative method of evaluating Quality Estimation systems, which is based on a linguistically-motivated Test Suite. create test-set consisting 14 linguistic error categories and we gather for each them set samples with both correct erroneous translations. Then, measure the performance 5 systems by checking their ability to distinguish between The detailed results are much more informative about system. fact that different perform differently at various phenomena confirms usefulness
We are presenting a hybrid MT approach in the WMT2016 Shared Translation Task for IT-Domain. Our work consists of several translation components based on rule-based and statistical approaches that feed into an informed selection mechanism. Additions to last year’s submission include WSD component, syntactically-enhanced component improvements relevant particular domain. also present detailed human evaluation output all components, focusing systematic errors.
Robert Schwarzenberg, David Harbecke, Vivien Macketanz, Eleftherios Avramidis, Sebastian Möller. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics (Demonstrations). 2019.
This paper describes a test suite submission providing detailed statistics of linguistic performance for the state-of-the-art German-English systems Fifth Conference Machine Translation (WMT20). The analysis covers 107 phenomena organized in 14 categories based on about 5,500 items, including manual annotation effort 45 person hours. Two (Tohoku and Huoshan) appear to have significantly better accuracy than others, although best system WMT20 is not one from WMT19 macro-average. Additionally,...
We employ a linguistically motivated challenge set in order to evaluate the state-of-the-art machine translation metrics submitted Metrics Shared Task of 8th Conference for Machine Translation. The includes about 21,000 items extracted from 155 systems three language directions, covering more than 100 linguistically-motivated phenomena organized 14 categories. that have best performance with regard our analysis are Cometoid22-wmt23 (a trained metric based on distillation) German-English and...
The quality of machine-generated text is a complex construct consisting various aspects and dimensions. We present study that aims to uncover relevant perceptual dimensions for one type text, is, Machine Translation. conducted crowdsourcing survey in the style Semantic Differential collect attribute ratings German MT outputs. An Exploratory Factor Analysis revealed underlying As result, we extracted four factors operate as Quality Experience outputs: precision, complexity, grammaticality,...
Evaluating translation models is a trade-off between effort and detail. On the one end of spectrum there are automatic count-based methods such as BLEU, on other linguistic evaluations by humans, which arguably more informative but also require disproportionately high effort. To narrow spectrum, we propose general approach how to automatically expose systematic differences human machine translations experts. Inspired adversarial settings, train neural text classifier distinguish from...
Patrick Stadler, Vivien Macketanz, Eleftherios Avramidis. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing: Student Research Workshop. 2021.