- Topic Modeling
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Machine Learning and Data Classification
- Advanced Neural Network Applications
- Software Engineering Research
- Advanced Text Analysis Techniques
- Auction Theory and Applications
- Multi-Agent Systems and Negotiation
- Mathematical Dynamics and Fractals
- Parallel Computing and Optimization Techniques
- Text Readability and Simplification
- Sentiment Analysis and Opinion Mining
- Organizational Management and Leadership
- Handwritten Text Recognition Techniques
- Scientific Computing and Data Management
- Text and Document Classification Technologies
- Advanced Topology and Set Theory
- Gaussian Processes and Bayesian Inference
- Speech and dialogue systems
- Limits and Structures in Graph Theory
- Advanced Bandit Algorithms Research
- Video Analysis and Summarization
- Functional Equations Stability Results
Amazon (United States)
2021
Amazon (Germany)
2020-2021
Allen Institute for Artificial Intelligence
2021
Cornell University
2017-2019
California Institute of Technology
2018
In recent years great success has been achieved in sentiment classification for English, thanks part to the availability of copious annotated resources. Unfortunately, most languages do not enjoy such an abundance labeled data. To tackle problem low-resource without adequate data, we propose Adversarial Deep Averaging Network (ADAN 1 ) transfer knowledge learned from data on a resource-rich source language where only unlabeled exist. ADAN two discriminative branches: classifier and...
Malicious software, or malware, continues to be a problem for computer users, corporations, and governments. Previous research [1] has explored training file-based, malware classifiers using two-stage approach. In the first stage, language model is used learn feature representation which then input second stage classifier. Pascanu et al. [1], either standard recurrent neural network (RNN) an echo state (ESN). this work, we propose several new classification architectures include long...
We introduce Probabilistic FastText, a new model for word embeddings that can capture multiple senses, sub-word structure, and uncertainty information. In particular, we represent each with Gaussian mixture density, where the mean of component is given by sum n-grams. This representation allows to share "strength" across structures (e.g. Latin roots), producing accurate representations rare, misspelt, or even unseen words. Moreover, different sense. FastText outperforms both which has no...
Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained be robust small perturbations of its inputs and parameters. To understand we conceptually explore how loss geometry interacts with training procedures. The dramatically improves generalization performance over supervised-only training; however, show that SGD struggles converge continues make large steps lead changes in predictions test data. Motivated by...
Word embeddings provide point representations of words containing useful semantic information. We introduce multimodal word distributions formed from Gaussian mixtures, for multiple meanings, entailment, and rich uncertainty To learn these distributions, we propose an energy-based max-margin objective. show that the resulting approach captures uniquely expressive information, outperforms alternatives, such as word2vec skip-grams, embeddings, on benchmark datasets similarity entailment.
Convolutional Neural Networks (CNNs) are powerful models that achieve impressive results for image classification. In addition, pre-trained CNNs also useful other computer vision tasks as generic feature extractors. This paper aims to gain insight into the aspect of CNN and demonstrate uses features. Our show maps can be used with Random Forests SVM yield classification outperforms original CNN. A is less than optimal (e.g. not fully trained or overfitting) extract features Forest/SVM...
In recent years great success has been achieved in sentiment classification for English, thanks part to the availability of copious annotated resources. Unfortunately, most languages do not enjoy such an abundance labeled data. To tackle problem low-resource without adequate data, we propose Adversarial Deep Averaging Network (ADAN) transfer knowledge learned from data on a resource-rich source language where only unlabeled exists. ADAN two discriminative branches: classifier and adversarial...
We propose a generative framework for joint sequence labeling and sentence-level classification. Our model performs multiple tasks at once using single, shared natural language output space. Unlike prior discriminative methods, our naturally incorporates label semantics shares knowledge across tasks. general purpose, performing well on few-shot learning, low resource, high resource demonstrate these advantages popular named entity recognition, slot labeling, intent classification benchmarks....
Large language models are increasingly becoming a cornerstone technology in artificial intelligence, the sciences, and society as whole, yet optimal strategies for dataset composition filtering remain largely elusive. Many of top-performing lack transparency their curation model development processes, posing an obstacle to fully open models. In this paper, we identify three core data-related challenges that must be addressed advance open-source These include (1) development, including data...
Recent advances in large language models (LLMs) demonstrate substantial capabilities natural understanding and generation tasks. With the growing number of LLMs, how to harness collective expertise multiple LLMs is an exciting open direction. Toward this goal, we propose a new approach that leverages strengths through Mixture-of-Agents (MoA) methodology. In our approach, construct layered MoA architecture wherein each layer comprises LLM agents. Each agent takes all outputs from agents...
We present new benchmarks on evaluation code generation models: MBXP and Multilingual HumanEval, MathQA-X. These datasets cover over 10 programming languages are generated using a scalable conversion framework that transpiles prompts test cases from the original Python into corresponding data in target language. Using these benchmarks, we able to assess performance of models multi-lingual fashion, discovered generalization ability language out-of-domain languages, advantages mono-lingual,...
By representing words with probability densities rather than point vectors, probabilistic word embeddings can capture rich and interpretable semantic information uncertainty. The uncertainty be particularly meaningful in capturing entailment relationships -- whereby general such as "entity" correspond to broad distributions that encompass more specific "animal" or "instrument". We introduce density order embeddings, which learn hierarchical representations through encapsulation of densities....
ML-powered code generation aims to assist developers write in a more productive manner by intelligently generating blocks based on natural language prompts. Recently, large pretrained deep learning models have pushed the boundary of and achieved impressive performance. However, huge number model parameters poses significant challenge their adoption typical software development environment, where developer might use standard laptop or mid-size server develop code. Such cost resources terms...
We introduce Probabilistic FastText, a new model for word embeddings that can capture multiple senses, sub-word structure, and uncertainty information. In particular, we represent each with Gaussian mixture density, where the mean of component is given by sum n-grams. This representation allows to share statistical strength across structures (e.g. Latin roots), producing accurate representations rare, misspelt, or even unseen words. Moreover, different sense. FastText outperforms both which...
Word embeddings provide point representations of words containing useful semantic information. We introduce multimodal word distributions formed from Gaussian mixtures, for multiple meanings, entailment, and rich uncertainty To learn these distributions, we propose an energy-based max-margin objective. show that the resulting approach captures uniquely expressive information, outperforms alternatives, such as word2vec skip-grams, embeddings, on benchmark datasets similarity entailment.
Activation sparsity can enable practical inference speedups in large language models (LLMs) by reducing the compute and memory-movement required for matrix multiplications during forward pass. However, existing methods face limitations that inhibit widespread adoption. Some approaches are tailored towards older with ReLU-based sparsity, while others require extensive continued pre-training on up to hundreds of billions tokens. This paper describes TEAL, a simple training-free method applies...
We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks including joint entity and relation extraction, nested named recognition, classification, semantic role labeling, event coreference resolution, dialogue state tracking. Instead of tackling the problem by training task-specific discriminative classifiers, we frame it as translation task augmented natural languages, from which task-relevant information can be...
For infinite-measure-preserving rank-one transformations, we give a condition guaranteeing that all finite Cartesian products of the transformation with its inverse are ergodic. We show infinite Chacón satisfies this condition.
Generative models, widely utilized in various applications, can often struggle with prompts corresponding to partial tokens. This stems from tokenization, where tokens fall out of distribution during inference, leading incorrect or nonsensical outputs. paper examines a technique alleviate the tokenization artifact on text completion generative maintaining performance even regular non-subword cases. The method, termed token alignment, involves backtracking last complete and ensuring model's...
In our study, we present bifurcated attention, a method developed for language model inference in single-context batch sampling contexts. This approach aims to reduce redundant memory IO costs, significant factor latency high sizes and long context lengths. Bifurcated attention achieves this by dividing the mechanism during incremental decoding into two distinct GEMM operations, focusing on KV cache from prefill process. ensures precise computation maintains usual computational load (FLOPs)...