Rodolfo Zevallos

ORCID: 0000-0003-0192-7740
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Speech Recognition and Synthesis
  • Topic Modeling
  • Speech and dialogue systems
  • Speech and Audio Processing
  • Biomedical Text Mining and Ontologies
  • Second Language Acquisition and Learning
  • Seismology and Earthquake Studies
  • Advanced Data Processing Techniques
  • Sociology, Governance, and Technology
  • Digital Communication and Language
  • Text Readability and Simplification
  • Music and Audio Processing
  • Authorship Attribution and Profiling
  • GNSS positioning and interference
  • Educational Technology in Learning
  • ICT in Developing Communities
  • Mental Health via Writing
  • Language and cultural evolution
  • E-Learning and Knowledge Management

Universitat Pompeu Fabra
2021-2023

National Agrarian University
2022

Universidad Nacional del Callao
2020

Pontifical Catholic University of Peru
2020

Milind Agarwal, Sweta Agrawal, Antonios Anastasopoulos, Luisa Bentivogli, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, Mingda Chen, William Khalid Choukri, Alexandra Chronopoulou, Anna Currey, Thierry Declerck, Qianqian Dong, Kevin Duh, Yannick Estève, Marcello Federico, Souhir Gahbiche, Barry Haddow, Benjamin Hsu, Phu Mon Htut, Hirofumi Inaguma, Dávid Javorský, John Judge, Yasumasa Kano, Tom Ko, Rishu Kumar, Pengwei Li, Xutai Ma, Prashant Mathur, Evgeny...

10.18653/v1/2023.iwslt-1.1 article EN cc-by 2023-01-01

This paper reports on the shared tasks organized by 21st IWSLT Conference. The address 7 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling dubbing, speech-to-speech dialect low-resource speech Indic languages. attracted 18 teams whose submissions are documented 26 system papers. growing interest towards translation is also witnessed constantly increasing number of task organizers contributors to overview paper, almost evenly...

10.48550/arxiv.2411.05088 preprint EN arXiv (Cornell University) 2024-11-07

Rodolfo Zevallos, John Ortega, William Chen, Richard Castro, Núria Bel, Cesar Toshio, Renzo Venturas, Hilario Aradiel and Nelsi Melgarejo. Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing. 2022.

10.18653/v1/2022.deeplo-1.1 article EN cc-by 2022-01-01

Web development began in the 1990s. The versatility and flexibility of this technology has made it possible for its use application to enhance technological different fields around world. However, Latin American countries such as Peru, there is a lack culturally relevant web applications applied socio-political field. On other hand, since 2018 date, many social political movements have been making efforts obtain perspectives demands from citizens promote constituent process that date not...

10.1109/icecce52056.2021.9514251 article EN 2019 International Conference on Electrical, Communication, and Computer Engineering (ICECCE) 2021-06-12

This article describes the QUESPA team speech translation (ST) submissions for Quechua to Spanish (QUE–SPA) track featured in Evaluation Campaign of IWSLT 2023: low-resource and dialect translation. Two main submission types were supported campaign: constrained unconstrained. We submitted six total systems which our best (primary) system consisted an ST model based on Fairseq S2T framework where audio representations created using log mel-scale filter banks as features translations performed...

10.18653/v1/2023.iwslt-1.23 article EN cc-by 2023-01-01

Language Models (LM) are becoming more and useful for providing representations upon which to train Natural Processing applications. However, there is now clear evidence that attention-based transformers require a critical amount of language data produce good enough LMs. The question we have addressed in this paper what extent the varies languages different morphological typology, particular those rich inflectional morphology, whether tokenization method preprocess can make difference. These...

10.18653/v1/2023.acl-long.699 article EN cc-by 2023-01-01

Nowadays, the main problem of deep learning techniques used in development automatic speech recognition (ASR) models is lack transcribed data. The goal this research to propose a new data augmentation method improve ASR for agglutinative and low-resource languages. This novel generates both synthetic text audio. Some experiments were conducted using corpus Quechua language, which an language. In study, sequence-to-sequence (seq2seq) model was applied generate text, addition generating...

10.48550/arxiv.2204.00291 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01

10.18653/v1/2024.emnlp-main.638 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01

Translating between languages with drastically different grammatical conventions poses challenges, not just for human interpreters but also machine translation systems. In this work, we specifically target the challenges posed by attributive nouns in Chinese, which frequently cause ambiguities English translation. By manually inserting omitted particle X ('DE'). news article titles from Penn Chinese Discourse Treebank, developed a targeted dataset to fine-tune Hugging Face models, improving...

10.48550/arxiv.2412.14323 preprint EN arXiv (Cornell University) 2024-12-18

Suicidal ideation is a serious health problem affecting millions of people worldwide. Social networks provide information about these mental problems through users' emotional expressions. We propose multilingual model leveraging transformer architectures like mBERT, XML-R, and mT5 to detect suicidal text across posts in six languages - Spanish, English, German, Catalan, Portuguese Italian. A Spanish suicide tweet dataset was translated into five other using SeamlessM4T. Each fine-tuned on...

10.48550/arxiv.2412.15498 preprint EN arXiv (Cornell University) 2024-12-19

Language technology is the missing piece of puzzle that will bring us closer to a complete revitalization endangered languages. Almost every digital product uses and dependent on language; language not anymore an option but key enabler solution boosting future growth. Technical issues are hard lesser problems building corpus languages, centuries oppression managed dent pride sense belonging which reflected in lack awareness loss own language. In order reach based technology, powered by...

10.1109/intercon50315.2020.9220197 article EN 2020-09-01

The application of self-supervision to speech representation learning has garnered significant interest in recent years, due its scalability large amounts unlabeled data. However, much progress, both terms pre-training and downstream evaluation, remained concentrated monolingual models that only consider English. Few other languages, even fewer indigenous ones. In our submission the New Language Track ASRU 2023 ML-SUPERB Challenge, we present an ASR corpus for Quechua, South American...

10.48550/arxiv.2310.03639 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Automatic Speech Recognition (ASR) is a key element in new services that helps users to interact with an automated system.Deep learning methods have made it possible deploy systems word error rates below 5% for ASR of English.However, the use these only available languages hundreds or thousands hours audio and their corresponding transcriptions.For so-called low-resource speed up availability resources can improve performance systems, creating on basis existing ones are being investigated.In...

10.21437/interspeech.2022-770 article EN Interspeech 2022 2022-09-16

A pesar de las leyes existentes, en la práctica el Estado peruano ignora multiculturalidad y se comporta como una entidad monolingüe monocultural. Dado que este paradigma equivocado todavía vigente, no ha invertido lo suficiente para desarrollar habilidades lingüísticas con fin servir a todos los ciudadanos por igual. Las consecuencias ello son falta fomento, discriminación finalmente aislamiento lleva extinción lenguas autóctonas. Nuestra iniciativa es cambiar equivocado, despertar orgullo...

10.30920/letras.91.134.9 article ES cc-by Letras (Lima) 2020-11-16

This paper reports on the experiments aimed to improve our understanding of role amount data required for training attention-based transformer language models. Specifically, we investigate impact reducing immense amounts pre-training through sampling strategies that identify and reduce high-frequency tokens as different studies have indicated existence very in might bias learning, causing undesired effects. In this light, describe algorithm iteratively assesses token frequencies removes...

10.18653/v1/2023.findings-emnlp.527 article EN cc-by 2023-01-01

We develop machine translation and speech synthesis systems to complement the efforts of revitalizing Judeo-Spanish, exiled language Sephardic Jews, which survived for centuries, but now faces threat extinction in digital age. Building on resources created by community Turkey elsewhere, we create corpora tools that would help preserve this future generations. For translation, first a Spanish Judeo-Spanish rule-based system, order generate large volumes synthetic parallel data relevant pairs:...

10.48550/arxiv.2205.15599 preprint EN cc-by-nc-nd arXiv (Cornell University) 2022-01-01

The Huqariq corpus is a multilingual collection of speech from native Peruvian languages. transcribed intended for the research and development technologies to preserve endangered languages in Peru. primarily designed automatic recognition, language identification text-to-speech tools. In order achieve sustainably, we employ crowdsourcing methodology. includes four Peru, it expected that by end year 2022, can reach up 20 out 48 has 220 hours audio recorded more than 500 volunteers, making...

10.48550/arxiv.2207.05498 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Automatic Speech Recognition (ASR) is a key element in new services that helps users to interact with an automated system. Deep learning methods have made it possible deploy systems word error rates below 5% for ASR of English. However, the use these only available languages hundreds or thousands hours audio and their corresponding transcriptions. For so-called low-resource speed up availability resources can improve performance systems, creating on basis existing ones are being...

10.48550/arxiv.2207.06872 preprint EN cc-by arXiv (Cornell University) 2022-01-01
Coming Soon ...