NFDI4DS | UHH-SEMS - Publication Details

Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification

OPENALEX - Publications

Xilun Chen Yu Sun Ben Athiwaratkun Claire Cardie Kilian Q. Weinberger

In recent years great success has been achieved in sentiment classification for English, thanks part to the availability of copious annotated resources. Unfortunately, most languages do not enjoy such an abundance labeled data. To tackle problem low-resource without adequate data, we propose Adversarial Deep Averaging Network (ADAN 1 ) transfer knowledge learned from data on a resource-rich source language where only unlabeled exist. ADAN two discriminative branches: classifier and...

10.1162/tacl_a_00039 article EN cc-by Transactions of the Association for Computational Linguistics 2018-12-01

Malware classification with LSTM and GRU language models and a character-level CNN

OPENALEX - Publications

Ben Athiwaratkun Jack W. Stokes

Malicious software, or malware, continues to be a problem for computer users, corporations, and governments. Previous research [1] has explored training file-based, malware classifiers using two-stage approach. In the first stage, language model is used learn feature representation which then input second stage classifier. Pascanu et al. [1], either standard recurrent neural network (RNN) an echo state (ESN). this work, we propose several new classification architectures include long...

10.1109/icassp.2017.7952603 article EN 2017-03-01

Probabilistic FastText for Multi-Sense Word Embeddings

OPENALEX - Publications

Ben Athiwaratkun Andrew Gordon Wilson Anima Anandkumar

We introduce Probabilistic FastText, a new model for word embeddings that can capture multiple senses, sub-word structure, and uncertainty information. In particular, we represent each with Gaussian mixture density, where the mean of component is given by sum n-grams. This representation allows to share "strength" across structures (e.g. Latin roots), producing accurate representations rare, misspelt, or even unseen words. Moreover, different sense. FastText outperforms both which has no...

10.18653/v1/p18-1001 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2018-01-01

There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average

OPENALEX - Publications

Ben Athiwaratkun Marc Finzi Pavel Izmailov Andrew Gordon Wilson

Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained be robust small perturbations of its inputs and parameters. To understand we conceptually explore how loss geometry interacts with training procedures. The dramatically improves generalization performance over supervised-only training; however, show that SGD struggles converge continues make large steps lead changes in predictions test data. Motivated by...

10.48550/arxiv.1806.05594 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Multimodal Word Distributions

OPENALEX - Publications

Ben Athiwaratkun Andrew Gordon Wilson

Word embeddings provide point representations of words containing useful semantic information. We introduce multimodal word distributions formed from Gaussian mixtures, for multiple meanings, entailment, and rich uncertainty To learn these distributions, we propose an energy-based max-margin objective. show that the resulting approach captures uniquely expressive information, outperforms alternatives, such as word2vec skip-grams, embeddings, on benchmark datasets similarity entailment.

10.18653/v1/p17-1151 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017-01-01

Feature Representation in Convolutional Neural Networks

OPENALEX - Publications

Ben Athiwaratkun Keegan Kang

Convolutional Neural Networks (CNNs) are powerful models that achieve impressive results for image classification. In addition, pre-trained CNNs also useful other computer vision tasks as generic feature extractors. This paper aims to gain insight into the aspect of CNN and demonstrate uses features. Our show maps can be used with Random Forests SVM yield classification outperforms original CNN. A is less than optimal (e.g. not fully trained or overfitting) extract features Forest/SVM...

10.48550/arxiv.1507.02313 preprint EN other-oa arXiv (Cornell University) 2015-01-01

Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification

OPENALEX - Publications

Xilun Chen Yu Sun Ben Athiwaratkun Claire Cardie Kilian Q. Weinberger

In recent years great success has been achieved in sentiment classification for English, thanks part to the availability of copious annotated resources. Unfortunately, most languages do not enjoy such an abundance labeled data. To tackle problem low-resource without adequate data, we propose Adversarial Deep Averaging Network (ADAN) transfer knowledge learned from data on a resource-rich source language where only unlabeled exists. ADAN two discriminative branches: classifier and adversarial...

10.48550/arxiv.1606.01614 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Augmented Natural Language for Generative Sequence Labeling

OPENALEX - Publications

Ben Athiwaratkun Cícero Nogueira dos Santos Jason Krone Bing Xiang

We propose a generative framework for joint sequence labeling and sentence-level classification. Our model performs multiple tasks at once using single, shared natural language output space. Unlike prior discriminative methods, our naturally incorporates label semantics shares knowledge across tasks. general purpose, performing well on few-shot learning, low resource, high resource demonstrate these advantages popular named entity recognition, slot labeling, intent classification benchmarks....

10.18653/v1/2020.emnlp-main.27 article EN cc-by 2020-01-01

RedPajama: an Open Dataset for Training Large Language Models

OPENALEX - Publications

Maurice Weber Daniel Fu Quentin Anthony Yonatan Oren Sally Adams and 14 more

Large language models are increasingly becoming a cornerstone technology in artificial intelligence, the sciences, and society as whole, yet optimal strategies for dataset composition filtering remain largely elusive. Many of top-performing lack transparency their curation model development processes, posing an obstacle to fully open models. In this paper, we identify three core data-related challenges that must be addressed advance open-source These include (1) development, including data...

10.48550/arxiv.2411.12372 preprint EN arXiv (Cornell University) 2024-11-19

Mixture-of-Agents Enhances Large Language Model Capabilities

OPENALEX - Publications

Junlin Wang Jue Wang Ben Athiwaratkun Ce Zhang James Zou

Recent advances in large language models (LLMs) demonstrate substantial capabilities natural understanding and generation tasks. With the growing number of LLMs, how to harness collective expertise multiple LLMs is an exciting open direction. Toward this goal, we propose a new approach that leverages strengths through Mixture-of-Agents (MoA) methodology. In our approach, construct layered MoA architecture wherein each layer comprises LLM agents. Each agent takes all outputs from agents...

10.48550/arxiv.2406.04692 preprint EN arXiv (Cornell University) 2024-06-07

Multi-lingual Evaluation of Code Generation Models

OPENALEX - Publications

Ben Athiwaratkun Sanjay Krishna Gouda Zijian Wang Xiaopeng Li Yuchen Tian and 20 more

We present new benchmarks on evaluation code generation models: MBXP and Multilingual HumanEval, MathQA-X. These datasets cover over 10 programming languages are generated using a scalable conversion framework that transpiles prompts test cases from the original Python into corresponding data in target language. Using these benchmarks, we able to assess performance of models multi-lingual fashion, discovered generalization ability language out-of-domain languages, advantages mono-lingual,...

10.48550/arxiv.2210.14868 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Hierarchical Density Order Embeddings

OPENALEX - Publications

Ben Athiwaratkun Andrew Gordon Wilson

By representing words with probability densities rather than point vectors, probabilistic word embeddings can capture rich and interpretable semantic information uncertainty. The uncertainty be particularly meaningful in capturing entailment relationships -- whereby general such as "entity" correspond to broad distributions that encompass more specific "animal" or "instrument". We introduce density order embeddings, which learn hierarchical representations through encapsulation of densities....

10.48550/arxiv.1804.09843 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Towards Greener Yet Powerful Code Generation via Quantization: An Empirical Study

OPENALEX - Publications

Xiaokai Wei Sujan K. Gonugondla Shiqi Wang Wasi Uddin Ahmad Baishakhi Ray and 11 more

ML-powered code generation aims to assist developers write in a more productive manner by intelligently generating blocks based on natural language prompts. Recently, large pretrained deep learning models have pushed the boundary of and achieved impressive performance. However, huge number model parameters poses significant challenge their adoption typical software development environment, where developer might use standard laptop or mid-size server develop code. Such cost resources terms...

10.1145/3611643.3616302 article EN 2023-11-30

Probabilistic FastText for Multi-Sense Word Embeddings

OPENALEX - Publications

Ben Athiwaratkun Andrew Gordon Wilson Anima Anandkumar

We introduce Probabilistic FastText, a new model for word embeddings that can capture multiple senses, sub-word structure, and uncertainty information. In particular, we represent each with Gaussian mixture density, where the mean of component is given by sum n-grams. This representation allows to share statistical strength across structures (e.g. Latin roots), producing accurate representations rare, misspelt, or even unseen words. Moreover, different sense. FastText outperforms both which...

10.48550/arxiv.1806.02901 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Multimodal Word Distributions

OPENALEX - Publications

Ben Athiwaratkun Andrew Gordon Wilson

Word embeddings provide point representations of words containing useful semantic information. We introduce multimodal word distributions formed from Gaussian mixtures, for multiple meanings, entailment, and rich uncertainty To learn these distributions, we propose an energy-based max-margin objective. show that the resulting approach captures uniquely expressive information, outperforms alternatives, such as word2vec skip-grams, embeddings, on benchmark datasets similarity entailment.

10.48550/arxiv.1704.08424 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Training-Free Activation Sparsity in Large Language Models

OPENALEX - Publications

James Liu Pragaash Ponnusamy Tianle Cai Han Guo Yoon Kim and 1 more

Activation sparsity can enable practical inference speedups in large language models (LLMs) by reducing the compute and memory-movement required for matrix multiplications during forward pass. However, existing methods face limitations that inhibit widespread adoption. Some approaches are tailored towards older with ReLU-based sparsity, while others require extensive continued pre-training on up to hundreds of billions tokens. This paper describes TEAL, a simple training-free method applies...

10.48550/arxiv.2408.14690 preprint EN arXiv (Cornell University) 2024-08-26

Structured Prediction as Translation between Augmented Natural Languages

OPENALEX - Publications

Giovanni Paolini Ben Athiwaratkun Jason Krone Jie Ma Alessandro Achille and 4 more

We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks including joint entity and relation extraction, nested named recognition, classification, semantic role labeling, event coreference resolution, dialogue state tracking. Instead of tackling the problem by training task-specific discriminative classifiers, we frame it as translation task augmented natural languages, from which task-relevant information can be...

10.48550/arxiv.2101.05779 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Infinite symmetric ergodic index and related examples in infinite measure

OPENALEX - Publications

Isaac Loh Cesar E. Silva Ben Athiwaratkun

For infinite-measure-preserving rank-one transformations, we give a condition guaranteeing that all finite Cartesian products of the transformation with its inverse are ergodic. We show infinite Chacón satisfies this condition.

10.4064/sm170330-9-9 article EN Studia Mathematica 2018-01-01

Token Alignment via Character Matching for Subword Completion

OPENALEX - Publications

Ben Athiwaratkun Shiqi Wang Mingyue Shang Yuchen Tian Zijian Wang and 5 more

Generative models, widely utilized in various applications, can often struggle with prompts corresponding to partial tokens. This stems from tokenization, where tokens fall out of distribution during inference, leading incorrect or nonsensical outputs. paper examines a technique alleviate the tokenization artifact on text completion generative maintaining performance even regular non-subword cases. The method, termed token alignment, involves backtracking last complete and ensuring model's...

10.48550/arxiv.2403.08688 preprint EN arXiv (Cornell University) 2024-03-13

Bifurcated Attention for Single-Context Large-Batch Sampling

OPENALEX - Publications

Ben Athiwaratkun Sujan K. Gonugondla Sanjay Krishna Gouda Haifeng Qian Hantian Ding and 8 more

In our study, we present bifurcated attention, a method developed for language model inference in single-context batch sampling contexts. This approach aims to reduce redundant memory IO costs, significant factor latency high sizes and long context lengths. Bifurcated attention achieves this by dividing the mechanism during incremental decoding into two distinct GEMM operations, focusing on KV cache from prefill process. ensures precise computation maintains usual computational load (FLOPs)...

10.48550/arxiv.2403.08845 preprint EN arXiv (Cornell University) 2024-03-13