- Natural Language Processing Techniques
- Topic Modeling
- Speech Recognition and Synthesis
- Text Readability and Simplification
- Mental Health via Writing
- Algorithms and Data Compression
- Speech and dialogue systems
- DNA and Biological Computing
- Biomedical Text Mining and Ontologies
- Hand Gesture Recognition Systems
- Authorship Attribution and Profiling
- Fuzzy Logic and Control Systems
- Artificial Intelligence in Law
- Cinema and Media Studies
- Semiconductor materials and devices
- Hearing Impairment and Communication
- Translation Studies and Practices
- Legal Education and Practice Innovations
- Advancements in Semiconductor Devices and Circuit Design
- Judicial and Constitutional Studies
- Terrorism, Counterterrorism, and Political Violence
- Galician and Iberian cultural studies
- Spam and Phishing Detection
- Graph Theory and Algorithms
- scientometrics and bibliometrics research
Dartmouth College
2021-2023
Northeastern University
2022-2023
Google (United States)
2023
University of Colorado Boulder
2022-2023
University of California, Berkeley
2023
University of Louisville
2023
Johns Hopkins University
2022-2023
Universidade Tecnológica Federal do Paraná
2023
Universidad de la República
2021-2023
Boston University
2023
Milind Agarwal, Sweta Agrawal, Antonios Anastasopoulos, Luisa Bentivogli, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, Mingda Chen, William Khalid Choukri, Alexandra Chronopoulou, Anna Currey, Thierry Declerck, Qianqian Dong, Kevin Duh, Yannick Estève, Marcello Federico, Souhir Gahbiche, Barry Haddow, Benjamin Hsu, Phu Mon Htut, Hirofumi Inaguma, Dávid Javorský, John Judge, Yasumasa Kano, Tom Ko, Rishu Kumar, Pengwei Li, Xutai Ma, Prashant Mathur, Evgeny...
Abteen Ebrahimi, Manuel Mager, Arturo Oncevay, Vishrav Chaudhary, Luis Chiruzzo, Angela Fan, John Ortega, Ricardo Ramos, Annette Rios, Ivan Vladimir Meza Ruiz, Gustavo Giménez-Lugo, Elisabeth Graham Neubig, Alexis Palmer, Rolando Coto-Solano, Thang Vu, Katharina Kann. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.
Manuel Mager, Arturo Oncevay, Abteen Ebrahimi, John Ortega, Annette Rios, Angela Fan, Ximena Gutierrez-Vasques, Luis Chiruzzo, Gustavo Giménez-Lugo, Ricardo Ramos, Ivan Vladimir Meza Ruiz, Rolando Coto-Solano, Alexis Palmer, Elisabeth Mager-Hois, Vishrav Chaudhary, Graham Neubig, Ngoc Thang Vu, Katharina Kann. Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages Americas. 2021.
In recent years, deep learning has shown promising results when used in the field of natural language processing (NLP).Neural networks (NNs) such as convolutional neural (CNNs) and recurrent (RNNs) have been for various NLP tasks including sentiment analysis, information retrieval, document classification.In this paper, we present Supreme Court Classifier (SCC), a system that applies these methods to problem classification legal court opinions.We compare using traditional machine with...
Languages can be considered endangered for many reasons. One of the principal reasons endangerment is disappearance its speakers. Another, more identifiable reason, lack written resources. We present an automated sub-segmentation system called AshMorph that deals with morphology Amazonian tribal language Ashaninka which at risk being due to availability (or resistance) native speakers and absence show by use a cross-lingual lexicon finite state transducers we increase accuracy than 30% when...
Abteen Ebrahimi, Manuel Mager, Shruti Rijhwani, Enora Rice, Arturo Oncevay, Claudia Baltazar, María Cortés, Cynthia Montaño, John E. Ortega, Rolando Coto-solano, Hilaria Cruz, Alexis Palmer, Katharina Kann. Proceedings of the Workshop on Natural Language Processing for Indigenous Languages Americas (AmericasNLP). 2023.
Abstract There has been considerable work recently in the natural language community and elsewhere on Responsible AI. Much of this focuses fairness biases (henceforth Risks 1.0), following 2016 best seller: Weapons Math Destruction . Two books published 2022, The Chaos Machine Like, Comment, Subscribe , raise additional risks to public health/safety/security such as genocide, insurrection, polarized politics, vaccinations (henceforth, 2.0). These suggest that use machine learning maximize...
he Termolator is an open-source high-performing terminology extraction system, available on Github. The combines several different approaches to get superior coverage and precision. in-line term component identifies potential instances of using a chunking procedure, similar noun group chunking, but favoring chunks that contain out-of-vocabulary words, nominalizations, technical adjectives, other specialized word classes. distributional ranks such according metrics including: (a) set favors...
Accurately modeling the I-V characteristics and current degradation for transistors is central to predicting circuit end-of-life behavior. In this work, we propose a machine learning model accurately at various stress conditions extend that make nominal use-bias predictions. The can be extended track predict any parametric change. We show an excellent agreement of with experimental results. Furthermore, use deep neural network aged over wide drain gate playback bias range reliably able...
Little attention has been paid to the development of human language technology for truly low-resource languages—i.e., languages with limited amounts digitally available text data, such as Indigenous languages. However, it shown that pretrained multilingual models are able perform crosslingual transfer in a zero-shot setting even which unseen during pretraining. Yet, prior work evaluating performance on largely shallow token-level tasks. It remains unclear if learning deeper semantic tasks is...
Suppliers developing semiconductor technologies for consumer electronics have been operating in a high-volume manner decades. It is often believed that there link between high volume, yield, and reliability. A potentially concerning misconception low volume manufacturing facilities then cannot achieve However, many `high reliability' markets, such as military, medical, aerospace, source their parts from low-volume manufacturers. In this work, the stated above discussed clarified terms of...
This article describes the QUESPA team speech translation (ST) submissions for Quechua to Spanish (QUE–SPA) track featured in Evaluation Campaign of IWSLT 2023: low-resource and dialect translation. Two main submission types were supported campaign: constrained unconstrained. We submitted six total systems which our best (primary) system consisted an ST model based on Fairseq S2T framework where audio representations created using log mel-scale filter banks as features translations performed...
Models based on bidirectional encoder representations from transformers (BERT) produce state of the art (SOTA) results many natural language processing (NLP) tasks such as named entity recognition (NER), part-ofspeech (POS) tagging etc.An interesting phenomenon occurs when classifying long documents those US supreme court where BERT-based models can be considered difficult to use a first-pass or out-of-the-box basis.In this paper, we experiment with several classification techniques for...
Computer-aided translation tools based on memories are widely used to assist professional translators. A memory (TM) consists of a set units (TU) made up source- and target-language segment pairs. For the new source s', these search TM retrieve TUs (s,t) whose segments more similar s'. The translator then chooses TU edit target t turn it into an adequate Fuzzy-match repair (FMR) techniques can be automatically modify parts that need edited. We describe language-independent FMR method first...
Nollywood, based on the idea of Bollywood from India, is a series outstanding movies that originate Nigeria. Unfortunately, while are in English, they hard to understand for many native speakers due dialect English spoken. In this article, we accomplish two goals: (1) create phonetic sub-title model able translate Nigerian speech American and (2) use most advanced toxicity detectors discover how toxic is. Our aim highlight text these videos which often times ignored lack dialectal...
Nollywood, based on the idea of Bollywood from India, is a series outstanding movies that originate Nigeria. Unfortunately, while are in English, they hard to understand for many native speakers due dialect English spoken. In this article, we accomplish two goals: (1) create phonetic sub-title model able translate Nigerian speech American and (2) use most advanced toxicity detectors discover how toxic is. Our aim highlight text these videos which often times ignored lack dialectal...
We propose a method to predict toxicity and other textual attributes through the use of natural language processing (NLP) techniques for two recent events: Ukraine-Russia Hamas-Israel conflicts. This article provides basis exploration in future conflicts with hopes mitigate risk analysis social media before after conflict begins. Our work compiles several datasets from Twitter Reddit both separation an aim predicting state avoidance. More specifically, we show that: (1) there is noticeable...