Annette Rios

ORCID: 0000-0002-8943-3472
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Text Readability and Simplification
  • Multimodal Machine Learning Applications
  • Hand Gesture Recognition Systems
  • Hearing Impairment and Communication
  • Human Pose and Action Recognition
  • Speech Recognition and Synthesis
  • Interpreting and Communication in Healthcare
  • Historical Linguistics and Language Studies
  • Spanish Linguistics and Language Studies
  • Authorship Attribution and Profiling
  • Multilingual Education and Policy
  • Semantic Web and Ontologies
  • Speech and Audio Processing
  • Emotion and Mood Recognition
  • Advanced Text Analysis Techniques
  • Music and Audio Processing
  • Language and cultural evolution
  • Digital Humanities and Scholarship
  • Speech and dialogue systems
  • Linguistic Variation and Morphology
  • Text and Document Classification Technologies
  • Linguistic research and analysis
  • Gait Recognition and Analysis

University of Zurich
2011-2023

Carnegie Mellon University
2021-2022

Dartmouth College
2021-2022

Carleton University
2022

Yale University
2022

George Mason University
2022

University of Alberta
2022

University of Colorado Boulder
2022

American Jewish Committee
2022

Universität Hamburg
2022

Recently, non-recurrent architectures (convolutional, self-attentional) have outperformed RNNs in neural machine translation. CNNs and self-attentional networks can connect distant words via shorter network paths than RNNs, it has been speculated that this improves their ability to model long-range dependencies. However, theoretical argument not tested empirically, nor alternative explanations for strong performance explored in-depth. We hypothesize the of could also be due extract semantic...

10.18653/v1/d18-1458 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

Abstract With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation large, Web-mined text datasets covering hundreds languages. We manually audit quality 205 language-specific corpora released with five major public (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource systematic issues: At least 15 no usable text, significant fraction contains less than 50% sentences acceptable quality. In...

10.1162/tacl_a_00447 article EN cc-by Transactions of the Association for Computational Linguistics 2022-01-01

The translation of pronouns presents a special challenge to machine this day, since it often requires context outside the current sentence. Recent work on models that have access information across sentence boundaries has seen only moderate improvements in terms automatic evaluation metrics such as BLEU. However, quantify overall quality are ill-equipped measure gains from additional context. We argue different kind is needed assess how well translate inter-sentential phenomena pronouns....

10.18653/v1/w18-6307 article EN cc-by 2018-01-01

Abteen Ebrahimi, Manuel Mager, Arturo Oncevay, Vishrav Chaudhary, Luis Chiruzzo, Angela Fan, John Ortega, Ricardo Ramos, Annette Rios, Ivan Vladimir Meza Ruiz, Gustavo Giménez-Lugo, Elisabeth Graham Neubig, Alexis Palmer, Rolando Coto-Solano, Thang Vu, Katharina Kann. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.435 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Translating text that diverges from the training domain is a key challenge for machine translation. Domain robustness---the generalization of models to unseen test domains---is low both statistical (SMT) and neural translation (NMT). In this paper, we study performance SMT NMT on out-of-domain sets. We find in unknown domains, suffer very different problems: systems are mostly adequate but not fluent, while adequate. For NMT, identify such hallucinations (translations fluent unrelated...

10.48550/arxiv.1911.03109 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Manuel Mager, Arturo Oncevay, Abteen Ebrahimi, John Ortega, Annette Rios, Angela Fan, Ximena Gutierrez-Vasques, Luis Chiruzzo, Gustavo Giménez-Lugo, Ricardo Ramos, Ivan Vladimir Meza Ruiz, Rolando Coto-Solano, Alexis Palmer, Elisabeth Mager-Hois, Vishrav Chaudhary, Graham Neubig, Ngoc Thang Vu, Katharina Kann. Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages Americas. 2021.

10.18653/v1/2021.americasnlp-1.23 article EN cc-by 2021-01-01

Sign languages are visual produced by the movement of hands, face, and body. In this paper, we evaluate representations based on skeleton poses, as these explainable, person-independent, privacy-preserving, low-dimensional representations. Basically, skeletal generalize over an individual's appearance background, allowing us to focus recognition motion. But how much information is lost representation? We perform two independent studies using state-of-the-art pose estimation systems. analyze...

10.1109/cvprw53098.2021.00382 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021-06-01

Automatic sign language processing is gaining popularity in Natural Language Processing (NLP) research (Yin et al., 2021). In machine translation (MT) particular, based on glosses a prominent approach. this paper, we review recent works neural gloss translation. We find that limitations of general and specific datasets are not discussed transparent manner there no common standard for evaluation.To address these issues, put forward concrete recommendations future Our suggestions advocate...

10.18653/v1/2023.acl-short.60 article EN cc-by 2023-01-01

Recently, non-recurrent architectures (convolutional, self-attentional) have outperformed RNNs in neural machine translation. CNNs and self-attentional networks can connect distant words via shorter network paths than RNNs, it has been speculated that this improves their ability to model long-range dependencies. However, theoretical argument not tested empirically, nor alternative explanations for strong performance explored in-depth. We hypothesize the of could also be due extract semantic...

10.48550/arxiv.1808.08946 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Translating text that diverges from the training domain is a key challenge for machine translation. Domain robustnes - generalization of models to unseen test domains low both statistical (SMT) and neural translation (NMT). In this paper, we study performance SMT NMT on out-of-domain sets. We find in unknown domains, suffer very different problems: systems are mostly adequate but not fluent, while adequate. For NMT, identify such hallucinations (translations fluent unrelated source) as...

10.5167/uzh-191047 article EN arXiv (Cornell University) 2020-10-06

We present a task to measure an MT system’s capability translate ambiguous words with their correct sense according the given context. The is based on German–English Word Sense Disambiguation (WSD) test set ContraWSD (Rios Gonzales et al., 2017), but it has been filtered reduce noise, and evaluation adapted assess output directly rather than scoring existing translations. evaluate all submissions WMT’18 shared translation task, plus number of from previous years, find that performance...

10.18653/v1/w18-6437 article EN cc-by 2018-01-01

Mathias Müller, Malihe Alikhani, Eleftherios Avramidis, Richard Bowden, Annelies Braffort, Necati Cihan Camgöz, Sarah Ebling, Cristina España-Bonet, Anne Göhring, Roman Grundkiewicz, Mert Inan, Zifan Jiang, Oscar Koller, Amit Moryossef, Annette Rios, Dimitar Shterionov, Sandra Sidler-Miserez, Katja Tissi, Davy Van Landuyt. Proceedings of the Eighth Conference on Machine Translation. 2023.

10.18653/v1/2023.wmt-1.4 article EN cc-by 2023-01-01

The article at hand aggregates the work of our group in automatic processing simplified German. We present four parallel (standard/simplified German) corpora compiled and curated by group. report on creation a gold standard sentence alignments from sources for evaluating alignment methods this standard. show that one performs best majority data sources. used two as basis first sentence-based neural machine translation (NMT) approach toward simplification In follow-up work, we extended model...

10.3389/fcomm.2022.706718 article EN cc-by Frontiers in Communication 2022-02-23

Most treebank work in the past has focused on European and Asian languages. The Wikipedia Treebank page lists treebanks (or projects) for about 20 modern languages (ranging from Basque to Swedish), five (Chinese, Japanese, Hindi, Korean, Thai), two ancient (Greek Latin), plus Arabic Hebrew. Almost no treebanking been done African or American indigenous languages.1 In we have explored parallel English, German Swedish [7]. Now would like explore what extent our tools guidelines will when...

10.5167/uzh-20593 article EN 2009-01-01

The task of document-level text simplification is very similar to summarization with the additional difficulty reducing complexity. We introduce a newly collected data set German texts, from Swiss news magazine 20 Minuten (‘20 Minutes’) that consists full articles paired simplified summaries. Furthermore, we present experiments on automatic pretrained multilingual mBART and modified version thereof more memory-friendly, using both our new existing corpora. Our modifications let us train at...

10.18653/v1/2021.newsum-1.16 article EN cc-by 2021-01-01

Tesis escrita por Annette Rios en la Universidad de Zurich bajo direccion Prof. Dr. Martin Volk. La tesis fue defendida el 21 septiembre 2015 ante tribunal formado Volk (Universidad Zurich, Departamento Linguistica Computacional), Balthasar Bickel Comparativa) y Paul Heggarty (Instituto Max Planck para Antropologia Evolutiva). obtuvo calificacion `Summa cum Laude'.

10.5167/uzh-123227 article ES 2016-02-26

We report on experiments in automatic text simplification (ATS) for German with multiple levels along the Common European Framework of Reference Languages (CEFR), simplifying standard into A1, A2 and B1.For that purpose, we investigate use source labels pretraining German, allowing us to simplify language a specific CEFR level.We show these approaches are especially effective low-resource scenarios, where able outperform transformer baseline.Moreover, introduce copy labels, which can help...

10.26615/978-954-452-072-4_150 article EN 2021-01-01

Little attention has been paid to the development of human language technology for truly low-resource languages—i.e., languages with limited amounts digitally available text data, such as Indigenous languages. However, it shown that pretrained multilingual models are able perform crosslingual transfer in a zero-shot setting even which unseen during pretraining. Yet, prior work evaluating performance on largely shallow token-level tasks. It remains unclear if learning deeper semantic tasks is...

10.3389/frai.2022.995667 article EN cc-by Frontiers in Artificial Intelligence 2022-12-02

Annette Rios, Chantal Amrhein, Noëmi Aepli, Rico Sennrich. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2021.

10.18653/v1/2021.naacl-main.354 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021-01-01

Zero-shot neural machine translation is an attractive goal because of the high cost obtaining data and building systems for new directions. However, previous papers have reported mixed success in zero-shot translation. It hard to predict which settings it will be effective, what limits performance compared a fully supervised system. In this paper, we investigate multilingual EN {FR,CS,DE,FI} system trained on WMT data. We find that highly unstable can vary by more than 6 BLEU between...

10.5167/uzh-193182 article EN Empirical Methods in Natural Language Processing 2020-11-20

Spelling correction methods developed for languages like English usually rely on complete lists of full word forms, a requirement that cannot be met morphologically complex languages. In this article we describe the implementation spell checker using finite state agglutinative language Quechua (ISO 639-3:que).

10.5167/uzh-52921 article EN Language and Technology Conference 2011-11-27

Zero-shot neural machine translation is an attractive goal because of the high cost obtaining data and building systems for new directions. However, previous papers have reported mixed success in zero-shot translation. It hard to predict which settings it will be effective, what limits performance compared a fully supervised system. In this paper, we investigate multilingual EN$\leftrightarrow${FR,CS,DE,FI} system trained on WMT data. We find that highly unstable can vary by more than 6 BLEU...

10.48550/arxiv.2011.01703 preprint EN cc-by arXiv (Cornell University) 2020-01-01
Coming Soon ...