Mehdi Rezagholizadeh

ORCID: 0000-0003-4014-6007
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Speech Recognition and Synthesis
  • Multimodal Machine Learning Applications
  • Domain Adaptation and Few-Shot Learning
  • Music and Audio Processing
  • Advanced Neural Network Applications
  • Speech and Audio Processing
  • Speech and dialogue systems
  • Adversarial Robustness in Machine Learning
  • Machine Learning and Data Classification
  • Text Readability and Simplification
  • Image Enhancement Techniques
  • Advanced Text Analysis Techniques
  • Color Science and Applications
  • Semantic Web and Ontologies
  • Algorithms and Data Compression
  • Text and Document Classification Technologies
  • Neural Networks and Applications
  • Machine Learning and Algorithms
  • Generative Adversarial Networks and Image Synthesis
  • Visual perception and processing mechanisms
  • Brain Tumor Detection and Classification
  • Data Quality and Management
  • Power Systems and Technologies

Huawei Technologies (Canada)
2019-2024

Huawei Technologies (Sweden)
2019-2023

Huawei Technologies (United Kingdom)
2022-2023

University of Alberta
2022

University of Waterloo
2021-2022

The Ark
2021-2022

McGill University
2012-2021

Huawei Technologies (China)
2019-2021

University of Tehran
2008-2011

We present the first sentence simplification model that learns explicit edit operations (ADD, DELETE, and KEEP) via a neural programmer-interpreter approach. Most current systems are variants of sequence-to-sequence models adopted from machine translation. These methods learn to simplify sentences as byproduct fact they trained on complex-simple pairs. By contrast, our is directly predict targeted parts input sentence, resembling way humans perform revision. Our outperforms previous...

10.18653/v1/p19-1331 preprint EN cc-by 2019-01-01

Knowledge distillation is considered as a training and compression strategy in which two neural networks, namely teacher student, are coupled together during training. The network supposed to be trustworthy predictor the student tries mimic its predictions. Usually, with lighter architecture selected so we can achieve yet deliver high-quality results. In such setting, only happens for final predictions whereas could also benefit from teacher’s supervision internal components. Motivated by...

10.1609/aaai.v35i15.17610 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

With the ever-growing size of pretrained models (PMs), fine-tuning them has become more expensive and resource-hungry. As a remedy, low-rank adapters (LoRA) keep main weights model frozen just introduce some learnable truncated SVD modules (so-called LoRA blocks) to model. While blocks are parameter-efficient, they suffer from two major problems: first, these is fixed cannot be modified after training (for example, if we need change rank blocks, then re-train scratch); second, optimizing...

10.18653/v1/2023.eacl-main.239 article EN cc-by 2023-01-01

State-of-the-art neural machine translation methods employ massive amounts of parameters. Drastically reducing computational costs such without affecting performance has been up to this point unsuccessful. To end, we propose FullyQT: an all-inclusive quantization strategy for the Transformer. best our knowledge, are first show that it is possible avoid any loss in quality with a fully quantized Indeed, compared full-precision, 8-bit models score greater or equal BLEU on most tasks. Comparing...

10.18653/v1/2020.findings-emnlp.1 article EN cc-by 2020-01-01

Significant memory and computational requirements of large deep neural networks restricts their application on edge devices. Knowledge distillation (KD) is a prominent model compression technique for in which the knowledge trained teacher transferred to smaller student model. The success mainly attributed its training objective function, exploits soft-target information (also known as “dark knowledge”) besides given regular hard labels set. However, it shown literature that larger gap...

10.18653/v1/2021.eacl-main.212 article EN cc-by 2021-01-01

In this work, we examine the ability of NER models to use contextual information when predicting type an ambiguous entity. We introduce NRB, a new testbed carefully designed diagnose Name Regularity Bias models. Our results indicate that all state-of-the-art tested show such bias; BERT fine-tuned significantly outperforming feature-based (LSTM-CRF) ones on despite having comparable (sometimes lower) performance standard benchmarks. To mitigate bias, propose novel model-agnostic training...

10.1162/tacl_a_00386 article EN cc-by Transactions of the Association for Computational Linguistics 2021-01-01

Existing Natural Language Understanding (NLU) models have been shown to incorporate dataset biases leading strong performance on in-distribution (ID) test sets but poor out-of-distribution (OOD) ones. We introduce a simple yet effective debiasing framework whereby the shallow representations of main model are used derive bias and both trained simultaneously. demonstrate three well studied NLU tasks that despite its simplicity, our method leads competitive OOD results. It significantly...

10.18653/v1/2021.findings-acl.168 preprint EN cc-by 2021-01-01

Abstract MIRACL is a multilingual dataset for ad hoc retrieval across 18 languages that collectively encompass over three billion native speakers around the world. This resource designed to support monolingual tasks, where queries and corpora are in same language. In total, we have gathered 726k high-quality relevance judgments 78k Wikipedia these languages, all annotations been performed by hired our team. covers both typologically close as well distant from 10 language families 13...

10.1162/tacl_a_00595 article EN cc-by Transactions of the Association for Computational Linguistics 2023-01-01

Recently, large language models (LLMs) have shown remarkable capabilities including understanding context, engaging in logical reasoning, and generating responses. However, this is achieved at the expense of stringent computational memory requirements, hindering their ability to effectively support long input sequences. This survey provides an inclusive review recent techniques methods devised extend sequence length LLMs, thereby enhancing capacity for long-context understanding. In...

10.24963/ijcai.2024/917 article EN 2024-07-26

With the growth of computing power neural machine translation (NMT) models also grow accordingly and become better. However, they harder to deploy on edge devices due memory constraints. To cope with this problem, a common practice is distill knowledge from large accurately-trained teacher network (T) into compact student (S). Although distillation (KD) useful in most cases, our study shows that existing KD techniques might not be suitable enough for deep NMT engines, so we propose novel...

10.18653/v1/2020.emnlp-main.74 article EN cc-by 2020-01-01

Ahmad Rashid, Vasileios Lioutas, Mehdi Rezagholizadeh. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.86 article EN cc-by 2021-01-01

Ehsan Kamalloo, Xinyu Zhang, Odunayo Ogundepo, Nandan Thakur, David Alfonso-hermelo, Mehdi Rezagholizadeh, Jimmy Lin. Proceedings of the 61st Annual Meeting Association for Computational Linguistics (Volume 5: Industry Track). 2023.

10.18653/v1/2023.acl-industry.50 article EN cc-by 2023-01-01

Recent advancements in Large Language Models (LLMs) have set themselves apart with their exceptional performance complex language modelling tasks. However, these models are also known for significant computational and storage requirements, primarily due to the quadratic computation complexity of softmax attention. To mitigate this issue, linear attention has been designed reduce space-time that is inherent standard transformers. In work, we embarked on a comprehensive exploration three key...

10.48550/arxiv.2502.01578 preprint EN arXiv (Cornell University) 2025-02-03

As large language models (LLMs) become integral to diverse applications, ensuring their reliability under varying input conditions is crucial. One key issue affecting this order sensitivity, wherein slight variations in arrangement can lead inconsistent or biased outputs. Although recent advances have reduced the problem remains unresolved. This paper investigates extent of sensitivity closed-source LLMs by conducting experiments across multiple tasks, including paraphrasing, relevance...

10.48550/arxiv.2502.04134 preprint EN arXiv (Cornell University) 2025-02-06

Deploying large language models (LLMs) in real-world applications is often hindered by strict computational and latency constraints. While dynamic inference offers the flexibility to adjust model behavior based on varying resource budgets, existing methods are frequently limited hardware inefficiencies or performance degradation. In this paper, we introduce Balcony, a simple yet highly effective framework for depth-based inference. By freezing pretrained LLM inserting additional transformer...

10.48550/arxiv.2503.05005 preprint EN arXiv (Cornell University) 2025-03-06

Knowledge distillation (KD) is a common knowledge transfer algorithm used for model compression across variety of deep learning based natural language processing (NLP) solutions. In its regular manifestations, KD requires access to the teacher’s training data student network. However, privacy concerns, regulations and proprietary reasons may prevent such data. We present, best our knowledge, first work on Zero-shot Distillation NLP, where learns from much larger teacher without any task...

10.18653/v1/2021.emnlp-main.526 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

Intermediate layer knowledge distillation (KD) can improve the standard KD technique (which only targets output of teacher and student models) especially over large pre-trained language models. However, intermediate suffers from excessive computational burdens engineering efforts required for setting up a proper mapping. To address these problems, we propose RAndom Layer Knowledge Distillation (RAIL-KD) approach in which, layers model are selected randomly to be distilled into model. This...

10.18653/v1/2022.findings-naacl.103 article EN cc-by Findings of the Association for Computational Linguistics: NAACL 2022 2022-01-01

In Natural Language Processing (NLP), finding data augmentation techniques that can produce high-quality human-interpretable examples has always been challenging.Recently, leveraging kNN such augmented are retrieved from large repositories of unlabelled sentences made a step toward interpretable augmentation.Inspired by this paradigm, we introduce MiniMax-kNN, sample efficient strategy tailored for Knowledge Distillation (KD).We exploit semi-supervised approach based on KD to train model...

10.18653/v1/2021.findings-acl.309 article EN cc-by 2021-01-01

Ali Edalati, Marzieh Tahaei, Ahmad Rashid, Vahid Nia, James Clark, Mehdi Rezagholizadeh. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 2: Short Papers). 2022.

10.18653/v1/2022.acl-short.24 preprint EN cc-by 2022-01-01

MIRACL (Multilingual Information Retrieval Across a Continuum of Languages) is multilingual dataset we have built for the WSDM 2023 Cup challenge that focuses on ad hoc retrieval across 18 different languages, which collectively encompass over three billion native speakers around world. These languages diverse typologies, originate from many language families, and are associated with varying amounts available resources -- including what researchers typically characterize as high-resource...

10.48550/arxiv.2210.09984 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Fine-tuning a Pre-trained Language Model (PLM) on specific downstream task has been well-known paradigm in Natural Processing. However, with the ever-growing size of PLMs, training entire model several tasks becomes very expensive and resource-hungry. Recently, different Parameter Efficient Tuning (PET) techniques are proposed to improve efficiency fine-tuning PLMs. One popular category PET methods is low-rank adaptation which insert learnable truncated SVD modules into original either...

10.48550/arxiv.2212.10650 preprint EN cc-by arXiv (Cornell University) 2022-01-01
Coming Soon ...