NFDI4DS | UHH-SEMS - Publication Details

John E. Ortega

ORCID: 0000-0002-2328-3205

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5050439418

Research Areas

Natural Language Processing Techniques
Topic Modeling
Speech Recognition and Synthesis
Text Readability and Simplification
Mental Health via Writing
Algorithms and Data Compression
Speech and dialogue systems
DNA and Biological Computing
Biomedical Text Mining and Ontologies
Hand Gesture Recognition Systems
Authorship Attribution and Profiling
Fuzzy Logic and Control Systems
Artificial Intelligence in Law
Cinema and Media Studies
Semiconductor materials and devices
Hearing Impairment and Communication
Translation Studies and Practices
Legal Education and Practice Innovations
Advancements in Semiconductor Devices and Circuit Design
Judicial and Constitutional Studies
Terrorism, Counterterrorism, and Political Violence
Galician and Iberian cultural studies
Spam and Phishing Detection
Graph Theory and Algorithms
scientometrics and bibliometrics research

Dartmouth College
2021-2023

Northeastern University
2022-2023

Google (United States)
2023

University of Colorado Boulder
2022-2023

University of California, Berkeley
2023

University of Louisville
2023

Johns Hopkins University
2022-2023

Universidade Tecnológica Federal do Paraná
2023

Universidad de la República
2021-2023

Boston University
2023

FINDINGS OF THE IWSLT 2023 EVALUATION CAMPAIGN

OPENALEX - Publications

Milind Agarwal Sweta Agrawal Antonios Anastasopoulos Luisa Bentivogli Ondřej Bojar and 57 more

Milind Agarwal, Sweta Agrawal, Antonios Anastasopoulos, Luisa Bentivogli, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, Mingda Chen, William Khalid Choukri, Alexandra Chronopoulou, Anna Currey, Thierry Declerck, Qianqian Dong, Kevin Duh, Yannick Estève, Marcello Federico, Souhir Gahbiche, Barry Haddow, Benjamin Hsu, Phu Mon Htut, Hirofumi Inaguma, Dávid Javorský, John Judge, Yasumasa Kano, Tom Ko, Rishu Kumar, Pengwei Li, Xutai Ma, Prashant Mathur, Evgeny...

10.18653/v1/2023.iwslt-1.1 article EN cc-by 2023-01-01

AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages

OPENALEX - Publications

Abteen Ebrahimi Manuel Mager Arturo Oncevay Vishrav Chaudhary Luis Chiruzzo and 12 more

Abteen Ebrahimi, Manuel Mager, Arturo Oncevay, Vishrav Chaudhary, Luis Chiruzzo, Angela Fan, John Ortega, Ricardo Ramos, Annette Rios, Ivan Vladimir Meza Ruiz, Gustavo Giménez-Lugo, Elisabeth Graham Neubig, Alexis Palmer, Rolando Coto-Solano, Thang Vu, Katharina Kann. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.435 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Findings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas

OPENALEX - Publications

Manuel Mager Arturo Oncevay Abteen Ebrahimi John E. Ortega Annette Rios and 13 more

Manuel Mager, Arturo Oncevay, Abteen Ebrahimi, John Ortega, Annette Rios, Angela Fan, Ximena Gutierrez-Vasques, Luis Chiruzzo, Gustavo Giménez-Lugo, Ricardo Ramos, Ivan Vladimir Meza Ruiz, Rolando Coto-Solano, Alexis Palmer, Elisabeth Mager-Hois, Vishrav Chaudhary, Graham Neubig, Ngoc Thang Vu, Katharina Kann. Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages Americas. 2021.

10.18653/v1/2021.americasnlp-1.23 article EN cc-by 2021-01-01

A Comparative Study of Classifying Legal Documents with Neural Networks

OPENALEX - Publications

Samir Undavia Adam Meyers John E. Ortega

In recent years, deep learning has shown promising results when used in the field of natural language processing (NLP).Neural networks (NNs) such as convolutional neural (CNNs) and recurrent (RNNs) have been for various NLP tasks including sentiment analysis, information retrieval, document classification.In this paper, we present Supreme Court Classifier (SCC), a system that applies these methods to problem classification legal court opinions.We compare using traditional machine with...

10.15439/2018f227 article EN cc-by Annals of Computer Science and Information Systems 2018-09-26

Neural machine translation with a polysynthetic low resource language

OPENALEX - Publications

John E. Ortega Richard Castro Mamani Kyunghyun Cho

10.1007/s10590-020-09255-9 article EN Machine Translation 2020-12-01

Overcoming Resistance: The Normalization of an Amazonian Tribal Language

OPENALEX - Publications

John E. Ortega Richard Alexander Castro-Mamani Jaime Rafael Montoya Samame

Languages can be considered endangered for many reasons. One of the principal reasons endangerment is disappearance its speakers. Another, more identifiable reason, lack written resources. We present an automated sub-segmentation system called AshMorph that deals with morphology Amazonian tribal language Ashaninka which at risk being due to availability (or resistance) native speakers and absence show by use a cross-lingual lexicon finite state transducers we increase accuracy than 30% when...

10.18653/v1/2020.loresmt-1.1 article EN 2020-01-01

NeighBERT: Medical Entity Linking Using Relation-Induced Dense Retrieval

OPENALEX - Publications

Ayush Singh Saranya Krishnamoorthy John E. Ortega

10.1007/s41666-023-00136-3 article EN Journal of Healthcare Informatics Research 2024-01-18

Findings of the AmericasNLP 2023 Shared Task on Machine Translation into Indigenous Languages

OPENALEX - Publications

Abteen Ebrahimi Manuel Mager Shruti Rijhwani Enora Rice Arturo Oncevay and 8 more

Abteen Ebrahimi, Manuel Mager, Shruti Rijhwani, Enora Rice, Arturo Oncevay, Claudia Baltazar, María Cortés, Cynthia Montaño, John E. Ortega, Rolando Coto-solano, Hilaria Cruz, Alexis Palmer, Katharina Kann. Proceedings of the Workshop on Natural Language Processing for Indigenous Languages Americas (AmericasNLP). 2023.

10.18653/v1/2023.americasnlp-1.23 article EN cc-by 2023-01-01

Emerging trends: Unfair, biased, addictive, dangerous, deadly, and insanely profitable

OPENALEX - Publications

Kenneth Church Annika Marie Schoene John E. Ortega Raman Chandrasekar Valia Kordoni

Abstract There has been considerable work recently in the natural language community and elsewhere on Responsible AI. Much of this focuses fairness biases (henceforth Risks 1.0), following 2016 best seller: Weapons Math Destruction . Two books published 2022, The Chaos Machine Like, Comment, Subscribe , raise additional risks to public health/safety/security such as genocide, insurrection, polarized politics, vaccinations (henceforth, 2.0). These suggest that use machine learning maximize...

10.1017/s1351324922000481 article EN cc-by Natural Language Engineering 2022-12-19

The Termolator: Terminology Recognition Based on Chunking, Statistical and Search-Based Scores

OPENALEX - Publications

Adam Meyers Yifan He Zachary Glass John E. Ortega Shasha Liao and 3 more

he Termolator is an open-source high-performing terminology extraction system, available on Github. The combines several different approaches to get superior coverage and precision. in-line term component identifies potential instances of using a chunking procedure, similar noun group chunking, but favoring chunks that contain out-of-vocabulary words, nominalizations, technical adjectives, other specialized word classes. distributional ranks such according metrics including: (a) set favors...

10.3389/frma.2018.00019 article EN cc-by Frontiers in Research Metrics and Analytics 2018-06-15

QUESPA Submission for the IWSLT 2024 Dialectal and Low-resource Speech Translation Task

OPENALEX - Publications

John E. Ortega Rodolfo Joel Zevallos Ibrahim Said Ahmad William Chen

10.18653/v1/2024.iwslt-1.17 article EN 2024-01-01

Machine Learning On Transistor Aging Data: Test Time Reduction and Modeling for Novel Devices

OPENALEX - Publications

Neel Chatterjee John E. Ortega Inanc Meric Peng Xiao Ilan Tsameret

Accurately modeling the I-V characteristics and current degradation for transistors is central to predicting circuit end-of-life behavior. In this work, we propose a machine learning model accurately at various stress conditions extend that make nominal use-bias predictions. The can be extended track predict any parametric change. We show an excellent agreement of with experimental results. Furthermore, use deep neural network aged over wide drain gate playback bias range reliably able...

10.1109/irps46558.2021.9405188 article EN 2022 IEEE International Reliability Physics Symposium (IRPS) 2021-03-01

AmericasNLI: Machine translation and natural language inference systems for Indigenous languages of the Americas

OPENALEX - Publications

Katharina Kann Abteen Ebrahimi Manuel Mager Arturo Oncevay John E. Ortega and 13 more

Little attention has been paid to the development of human language technology for truly low-resource languages—i.e., languages with limited amounts digitally available text data, such as Indigenous languages. However, it shown that pretrained multilingual models are able perform crosslingual transfer in a zero-shot setting even which unseen during pretraining. Yet, prior work evaluating performance on largely shallow token-level tasks. It remains unclear if learning deeper semantic tasks is...

10.3389/frai.2022.995667 article EN cc-by Frontiers in Artificial Intelligence 2022-12-02

On the relationship between semiconductor manufacturing volume, yield, and reliability

OPENALEX - Publications

J. J. Siddiqui John E. Ortega Brian Albus

Suppliers developing semiconductor technologies for consumer electronics have been operating in a high-volume manner decades. It is often believed that there link between high volume, yield, and reliability. A potentially concerning misconception low volume manufacturing facilities then cannot achieve However, many `high reliability' markets, such as military, medical, aerospace, source their parts from low-volume manufacturers. In this work, the stated above discussed clarified terms of...

10.1109/irps.2017.7936409 article EN 2022 IEEE International Reliability Physics Symposium (IRPS) 2017-04-01

QUESPA Submission for the IWSLT 2023 Dialect and Low-resource Speech Translation Tasks

OPENALEX - Publications

John E. Ortega Rodolfo Zevallos William Chen

This article describes the QUESPA team speech translation (ST) submissions for Quechua to Spanish (QUE–SPA) track featured in Evaluation Campaign of IWSLT 2023: low-resource and dialect translation. Two main submission types were supported campaign: constrained unconstrained. We submitted six total systems which our best (primary) system consisted an ST model based on Fairseq S2T framework where audio representations created using log mel-scale filter banks as features translations performed...

10.18653/v1/2023.iwslt-1.23 article EN cc-by 2023-01-01

Classification of US Supreme Court Cases using BERT-Based Techniques

OPENALEX - Publications

Shubham Vatsal Adam Meyers John E. Ortega

Models based on bidirectional encoder representations from transformers (BERT) produce state of the art (SOTA) results many natural language processing (NLP) tasks such as named entity recognition (NER), part-ofspeech (POS) tagging etc.An interesting phenomenon occurs when classifying long documents those US supreme court where BERT-based models can be considered difficult to use a first-pass or out-of-the-box basis.In this paper, we experiment with several classification techniques for...

10.26615/978-954-452-092-2_128 article EN 2023-01-01

Fuzzy-Match Repair Guided by Quality Estimation

OPENALEX - Publications

John E. Ortega Mikel L. Forcada Felipe Sánchez-Martínez

Computer-aided translation tools based on memories are widely used to assist professional translators. A memory (TM) consists of a set units (TU) made up source- and target-language segment pairs. For the new source s', these search TM retrieve TUs (s,t) whose segments more similar s'. The translator then chooses TU edit target t turn it into an adequate Fuzzy-match repair (FMR) techniques can be automatically modify parts that need edited. We describe language-independent FMR method first...

10.1109/tpami.2020.3021361 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2020-09-02

Nollywood: Let’s Go to the Movies!

OPENALEX - Publications

John E. Ortega William Chen Ibrahim Said Ahmad

Nollywood, based on the idea of Bollywood from India, is a series outstanding movies that originate Nigeria. Unfortunately, while are in English, they hard to understand for many native speakers due dialect English spoken. In this article, we accomplish two goals: (1) create phonetic sub-title model able translate Nigerian speech American and (2) use most advanced toxicity detectors discover how toxic is. Our aim highlight text these videos which often times ignored lack dialectal...

10.20944/preprints202402.0845.v1 preprint EN 2024-02-15

Nollywood: Let's Go to the Movies!

OPENALEX - Publications

John E. Ortega Ibrahim Said Ahmad William Chen

10.48550/arxiv.2407.02631 preprint EN arXiv (Cornell University) 2024-07-02

NLP Case Study on Predicting the Before and After of the Ukraine-Russia and Hamas-Israel Conflicts

OPENALEX - Publications

Jordan Miner John E. Ortega

We propose a method to predict toxicity and other textual attributes through the use of natural language processing (NLP) techniques for two recent events: Ukraine-Russia Hamas-Israel conflicts. This article provides basis exploration in future conflicts with hopes mitigate risk analysis social media before after conflict begins. Our work compiles several datasets from Twitter Reddit both separation an aim predicting state avoidance. More specifically, we show that: (1) there is noticeable...

10.48550/arxiv.2410.06427 preprint EN arXiv (Cornell University) 2024-10-08

Coming Soon ...