NFDI4DS | UHH-SEMS - Publication Details

Dara Bahri

ORCID: 0000-0003-0144-2911

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5036477705

Research Areas

Topic Modeling
Natural Language Processing Techniques
Multimodal Machine Learning Applications
Domain Adaptation and Few-Shot Learning
Machine Learning and Data Classification
Advanced Neural Network Applications
Anomaly Detection Techniques and Applications
Speech Recognition and Synthesis
Advanced Image and Video Retrieval Techniques
Neural Networks and Applications
Information Retrieval and Search Behavior
Computational and Text Analysis Methods
Image Retrieval and Classification Techniques
Web Data Mining and Analysis
Multidisciplinary Science and Engineering Research
Software System Performance and Reliability
Mechanical and Optical Resonators
Force Microscopy Techniques and Applications
Machine Learning and Algorithms
Infrastructure Maintenance and Monitoring
Artificial Intelligence in Games
Cloud Computing and Resource Management
Cancer-related molecular mechanisms research
Remote-Sensing Image Classification
Spam and Phishing Detection

Google (United States)
2018-2023

Université de Montréal
2021

Centre Universitaire de Mila
2021

Lawrence Berkeley National Laboratory
2014

Efficient Transformers: A Survey

OPENALEX - Publications

Yi Tay Mostafa Dehghani Dara Bahri Donald Metzler

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field natural language processing for example, Transformers become an indispensable staple in modern deep learning stack. Recently, dizzying number "X-former" models been proposed - Reformer, Linformer, Performer, Longformer, name few which improve upon original architecture, many make improvements around...

10.48550/arxiv.2009.06732 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Synthesizer: Rethinking Self-Attention in Transformer Models

OPENALEX - Publications

Yi Tay Dara Bahri Donald Metzler Da-Cheng Juan Zhe Zhao and 1 more

The dot product self-attention is known to be central and indispensable state-of-the-art Transformer models. But it really required? This paper investigates the true importance contribution of product-based mechanism on performance Via extensive experiments, we find that (1) random alignment matrices surprisingly perform quite competitively (2) learning attention weights from token-token (query-key) interactions useful but not important after all. To this end, propose \textsc{Synthesizer}, a...

10.48550/arxiv.2005.00743 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Long Range Arena: A Benchmark for Efficient Transformers

OPENALEX - Publications

Yi Tay Mostafa Dehghani Samira Abnar Yikang Shen Dara Bahri and 5 more

Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity. In the recent months, a wide spectrum efficient, fast have been proposed tackle this problem, more often than claiming superior or comparable model quality vanilla Transformer models. To date, there is no well-established consensus on how evaluate class Moreover, inconsistent benchmarking tasks and datasets makes it difficult assess relative amongst many This paper proposes...

10.48550/arxiv.2011.04006 preprint EN other-oa arXiv (Cornell University) 2020-01-01

UL2: Unifying Language Learning Paradigms

OPENALEX - Publications

Yi Tay Mostafa Dehghani Vinh Q. Tran Xavier García Jason Lee and 8 more

Existing pre-trained models are generally geared towards a particular class of problems. To date, there seems to be still no consensus on what the right architecture and pre-training setup should be. This paper presents unified framework for that universally effective across datasets setups. We begin by disentangling architectural archetypes with objectives -- two concepts commonly conflated. Next, we present generalized & perspective self-supervision in NLP show how different can cast as...

10.48550/arxiv.2205.05131 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Sparse Sinkhorn Attention

OPENALEX - Publications

Yi Tay Dara Bahri Yang Liu Donald Metzler Da-Cheng Juan

We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend. Our is based on differentiable sorting of internal representations. Concretely, we introduce meta network that learns generate latent permutations over sequences. Given sorted sequences, are then able compute quasi-global attention with only local windows, improving the memory efficiency module. To this end, algorithmic innovations such as Causal Balancing SortCut, dynamic sequence truncation...

10.48550/arxiv.2002.11296 preprint EN other-oa arXiv (Cornell University) 2020-01-01

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

OPENALEX - Publications

Vamsi Aribandi Yi Tay Tal Schuster Jinfeng Rao Huaixiu Zheng and 9 more

Despite the recent success of multi-task learning and transfer for natural language processing (NLP), few works have systematically studied effect scaling up number tasks during pre-training. Towards this goal, paper introduces ExMix (Extreme Mixture): a massive collection 107 supervised NLP across diverse domains task-families. Using ExMix, we study pre-training at largest scale to date, analyze co-training amongst common families tasks. Through analysis, show that manually curating an...

10.48550/arxiv.2111.10952 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Transformer Memory as a Differentiable Search Index

OPENALEX - Publications

Yi Tay Vinh Q. Tran Mostafa Dehghani Jianmo Ni Dara Bahri and 8 more

In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all about the corpus is encoded parameters of model. To end, introduce Differentiable Search Index (DSI), new paradigm learns text-to-text model maps string queries directly to relevant docids; other words, DSI answers using only its parameters, dramatically simplifying whole process. We study variations how documents and their identifiers are represented, training procedures,...

10.48550/arxiv.2202.06991 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

OPENALEX - Publications

Yi Tay Vinh Q. Tran Sebastian Ruder Jai Prakash Gupta Hyung Won Chung and 5 more

State-of-the-art models in natural language processing rely on separate rigid subword tokenization algorithms, which limit their generalization ability and adaptation to new settings. In this paper, we propose a model inductive bias that learns end-to-end as part of the model. To end, introduce soft gradient-based module (GBST) automatically latent representations from characters data-driven fashion. Concretely, GBST enumerates candidate blocks score them position-wise fashion using block...

10.48550/arxiv.2106.12672 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Rethinking search

OPENALEX - Publications

Donald Metzler Yi Tay Dara Bahri Marc Najork

When experiencing an information need, users want to engage with a domain expert, but often turn retrieval system, such as search engine, instead. Classical systems do not answer needs directly, instead provide references (hopefully authoritative) answers. Successful question answering offer limited corpus created on-demand by human experts, which is neither timely nor scalable. Pre-trained language models, contrast, are capable of directly generating prose that may be responsive at present...

10.1145/3476415.3476428 article EN ACM SIGIR Forum 2021-06-01

Are Pretrained Convolutions Better than Pretrained Transformers?

OPENALEX - Publications

Yi Tay Mostafa Dehghani Jai Prakash Gupta Vamsi Aribandi Dara Bahri and 2 more

Yi Tay, Mostafa Dehghani, Jai Prakash Gupta, Vamsi Aribandi, Dara Bahri, Zhen Qin, Donald Metzler. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.335 article EN cc-by 2021-01-01

Confident Adaptive Language Modeling

OPENALEX - Publications

Tal Schuster Adam Fisch Jai Prakash Gupta Mostafa Dehghani Dara Bahri and 3 more

Recent advances in Transformer-based large language models (LLMs) have led to significant performance improvements across many tasks. These gains come with a drastic increase the models' size, potentially leading slow and costly use at inference time. In practice, however, series of generations made by LLMs is composed varying levels difficulty. While certain predictions truly benefit from full capacity, other continuations are more trivial can be solved reduced compute. this work, we...

10.48550/arxiv.2207.07061 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Sharpness-Aware Minimization Improves Language Model Generalization

OPENALEX - Publications

Dara Bahri Hossein Mobahi Yi Tay

The allure of superhuman-level capabilities has led to considerable interest in language models like GPT-3 and T5, wherein the research has, by large, revolved around new model architectures, training tasks, loss objectives, along with substantial engineering efforts scale up capacity dataset size. Comparatively little work been done improve generalization these through better optimization. In this work, we show that Sharpness-Aware Minimization (SAM), a recently proposed optimization...

10.18653/v1/2022.acl-long.508 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption

OPENALEX - Publications

Dara Bahri Heinrich Jiang Yi Tay Donald Metzler

Self-supervised contrastive representation learning has proved incredibly successful in the vision and natural language domains, enabling state-of-the-art performance with orders of magnitude less labeled data. However, such methods are domain-specific little been done to leverage this technique on real-world tabular datasets. We propose SCARF, a simple, widely-applicable for learning, where views formed by corrupting random subset features. When applied pre-train deep neural networks 69...

10.48550/arxiv.2106.15147 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Decoding-based Regression

OPENALEX - Publications

Xiao Song Dara Bahri

Language models have recently been shown capable of performing regression tasks wherein numeric predictions are represented as decoded strings. In this work, we provide theoretical grounds for capability and furthermore investigate the utility causal auto-regressive sequence when they applied to any feature representation. We find that, despite being trained in usual way - next-token prediction via cross-entropy loss decoding-based is performant traditional approaches tabular tasks, while...

10.48550/arxiv.2501.19383 preprint EN arXiv (Cornell University) 2025-01-31

StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling

OPENALEX - Publications

Yikang Shen Yi Tay Che Zheng Dara Bahri Donald Metzler and 1 more

Yikang Shen, Yi Tay, Che Zheng, Dara Bahri, Donald Metzler, Aaron Courville. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.559 article EN cc-by 2021-01-01

Are Pre-trained Convolutions Better than Pre-trained Transformers?

OPENALEX - Publications

Yi Tay Mostafa Dehghani Jai Prakash Gupta Dara Bahri Vamsi Aribandi and 2 more

In the era of pre-trained language models, Transformers are de facto choice model architectures. While recent research has shown promise in entirely convolutional, or CNN, architectures, they have not been explored using pre-train-fine-tune paradigm. context convolutional models competitive to when pre-trained? This paper investigates this question and presents several interesting findings. Across an extensive set experiments on 8 datasets/tasks, we find that CNN-based outperform their...

10.48550/arxiv.2105.03322 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Reverse Engineering Configurations of Neural Text Generation Models

OPENALEX - Publications

Yi Tay Dara Bahri Che Zheng Clifford Brunk Donald Metzler and 1 more

Recent advances in neural text generation modeling have resulted a number of societal concerns related to how such approaches might be used malicious ways. It is therefore desirable develop deeper understanding the fundamental properties models. The study artifacts that emerge machine generated as result choices nascent research area. To this end, extent and degree which these surface still unclear. In spirit better generative models their artifacts, we propose new task distinguishing...

10.18653/v1/2020.acl-main.25 preprint EN cc-by 2020-01-01

Choppy: Cut Transformer for Ranked List Truncation

OPENALEX - Publications

Dara Bahri Yi Tay Che Zheng Donald Metzler Andrew Tomkins

Work in information retrieval has traditionally focused on ranking and relevance: given a query, return some number of results ordered by relevance to the user. However, problem determining how many return, i.e. optimally truncate ranked result list, received less attention despite being critical importance range applications. Such truncation is balancing act between overall relevance, or usefulness results, with user cost processing more results. In this work, we propose Choppy, an...

10.1145/3397271.3401188 article EN 2020-07-25

Coming Soon ...