Jacob Andreas

ORCID: 0000-0002-3141-5845
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Multimodal Machine Learning Applications
  • Speech and dialogue systems
  • Explainable Artificial Intelligence (XAI)
  • Reinforcement Learning in Robotics
  • Text Readability and Simplification
  • Domain Adaptation and Few-Shot Learning
  • Software Engineering Research
  • Neural Networks and Applications
  • Adversarial Robustness in Machine Learning
  • Language and cultural evolution
  • Machine Learning and Algorithms
  • Animal Vocal Communication and Behavior
  • Computational and Text Analysis Methods
  • Marine animal studies overview
  • Speech Recognition and Synthesis
  • Underwater Acoustics Research
  • Machine Learning and Data Classification
  • Semantic Web and Ontologies
  • AI-based Problem Solving and Planning
  • Image Retrieval and Classification Techniques
  • Advanced Image and Video Retrieval Techniques
  • Bayesian Modeling and Causal Inference
  • Neurobiology of Language and Bilingualism

Microsoft Research (United Kingdom)
2020-2024

Massachusetts Institute of Technology
2019-2024

K Lab (United States)
2023-2024

Intel (United States)
2023-2024

Microsoft (United States)
2020-2024

Citigroup
2022-2024

IT University of Copenhagen
2023

Tokyo Institute of Technology
2023

Administration for Community Living
2023

Moscow Institute of Thermal Technology
2023

Visual question answering is fundamentally compositional in nature-a like where the dog? shares substructure with questions what color and cat? This paper seeks to simultaneously exploit representational capacity of deep networks linguistic structure questions. We describe a procedure for constructing learning neural module networks, which compose collections jointly-trained "modules" into answering. Our approach decomposes their substructures, uses these structures dynamically instantiate...

10.1109/cvpr.2016.12 article EN 2016-06-01

Natural language questions are inherently compositional, and many most easily answered by reasoning about their decomposition into modular sub-problems. For example, to answer “is there an equal number of balls boxes?” we can look for balls, boxes, count them, compare the results. The recently proposed Neural Module Network (NMN) architecture [3, 2] implements this approach question answering parsing linguistic substructures assembling question-specific deep networks from smaller modules...

10.1109/iccv.2017.93 article EN 2017-10-01

Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Dan Klein. Proceedings of the 2016 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2016.

10.18653/v1/n16-1181 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2016-01-01

People often refer to entities in an image terms of their relationships with other entities. For example, the black cat sitting under table refers both a entity and its relationship another entity. Understanding these is essential for interpreting grounding such natural language expressions. Most prior work focuses on either entire referential expressions holistically one region, or localizing based fixed set categories. In this paper we instead present modular deep architecture capable...

10.1109/cvpr.2017.470 article EN 2017-07-01

Navigation guided by natural language instructions presents a challenging reasoning problem for instruction followers. Natural typically identify only few high-level decisions and landmarks rather than complete low-level motor behaviors; much of the missing information must be inferred based on perceptual context. In machine learning settings, this is doubly challenging: it difficult to collect enough annotated data enable process from scratch, also implement using generic sequence models....

10.48550/arxiv.1806.02724 preprint EN other-oa arXiv (Cornell University) 2018-01-01

In this work, we present a minimal neural model for constituency parsing based on independent scoring of labels and spans. We show that is not only compatible with classical dynamic programming techniques, but also admits novel greedy top-down inference algorithm recursive partitioning the input. demonstrate empirically both prediction schemes are competitive recent when combined basic extensions to capable achieving state-of-the-art single-model performance Penn Treebank (91.79 F1) strong...

10.18653/v1/p17-1076 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017-01-01

Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, Joseph Turian. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020.

10.18653/v1/2020.emnlp-main.703 article EN cc-by 2020-01-01

We propose a simple data augmentation protocol aimed at providing compositional inductive bias in conditional and unconditional sequence models. Under this protocol, synthetic training examples are constructed by taking real replacing (possibly discontinuous) fragments with other that appear least one similar environment. The is model-agnostic useful for variety of tasks. Applied to neural sequence-to-sequence models, it reduces error rate as much 87% on diagnostic tasks from the SCAN...

10.18653/v1/2020.acl-main.676 article EN cc-by 2020-01-01

We present a model for contrastively describing scenes, in which context-specific behavior results from combination of inferencedriven pragmatics and learned semantics.Like previous approaches to language generation, our uses simple featuredriven architecture (here pair neural "listener" "speaker" models) ground the world.Like inference-driven pragmatics, actively reasons about listener when selecting utterances.For training, approach requires only ordinary captions, annotated without...

10.18653/v1/d16-1125 preprint EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2016-01-01

To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of world, learn transfer it task at hand. Recent advances representation learning for language make possible build models that acquire world knowledge from text corpora integrate this into downstream decision making problems. We thus argue time is right investigate a tight integration natural understanding RL particular. survey state field, including...

10.24963/ijcai.2019/880 article EN 2019-07-28

We describe a framework for multitask deep reinforcement learning guided by policy sketches. Sketches annotate tasks with sequences of named subtasks, providing information about high-level structural relationships among but not how to implement them---specifically the detailed guidance used much previous work on abstractions RL (e.g. intermediate rewards, subtask completion signals, or intrinsic motivations). To learn from sketches, we present model that associates every modular subpolicy,...

10.48550/arxiv.1611.01796 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Natural language questions are inherently compositional, and many most easily answered by reasoning about their decomposition into modular sub-problems. For example, to answer "is there an equal number of balls boxes?" we can look for balls, boxes, count them, compare the results. The recently proposed Neural Module Network (NMN) architecture implements this approach question answering parsing linguistic substructures assembling question-specific deep networks from smaller modules that each...

10.48550/arxiv.1704.05526 preprint EN other-oa arXiv (Cornell University) 2017-01-01

We describe a question answering model that applies to both images and structured knowledge bases. The uses natural language strings automatically assemble neural networks from collection of composable modules. Parameters for these modules are learned jointly with network-assembly parameters via reinforcement learning, only (world, question, answer) triples as supervision. Our approach, which we term dynamic network, achieves state-of-the-art results on benchmark datasets in visual domains.

10.48550/arxiv.1601.01705 preprint EN other-oa arXiv (Cornell University) 2016-01-01

We describe an approach to task-oriented dialogue in which state is represented as a dataflow graph. A agent maps each user utterance program that extends this Programs include metacomputation operators for reference and revision reuse fragments from previous turns. Our graph-based enables the expression manipulation of complex intents, explicit makes these intents easier learned models predict. introduce new dataset, SMCalFlow, featuring dialogues about events, weather, places, people....

10.1162/tacl_a_00333 article EN cc-by Transactions of the Association for Computational Linguistics 2020-09-21

Belinda Z. Li, Maxwell Nye, Jacob Andreas. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.143 article EN cc-by 2021-01-01
Coming Soon ...