- Topic Modeling
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Speech and dialogue systems
- Explainable Artificial Intelligence (XAI)
- Reinforcement Learning in Robotics
- Text Readability and Simplification
- Domain Adaptation and Few-Shot Learning
- Software Engineering Research
- Neural Networks and Applications
- Adversarial Robustness in Machine Learning
- Language and cultural evolution
- Machine Learning and Algorithms
- Animal Vocal Communication and Behavior
- Computational and Text Analysis Methods
- Marine animal studies overview
- Speech Recognition and Synthesis
- Underwater Acoustics Research
- Machine Learning and Data Classification
- Semantic Web and Ontologies
- AI-based Problem Solving and Planning
- Image Retrieval and Classification Techniques
- Advanced Image and Video Retrieval Techniques
- Bayesian Modeling and Causal Inference
- Neurobiology of Language and Bilingualism
Microsoft Research (United Kingdom)
2020-2024
Massachusetts Institute of Technology
2019-2024
K Lab (United States)
2023-2024
Intel (United States)
2023-2024
Microsoft (United States)
2020-2024
Citigroup
2022-2024
IT University of Copenhagen
2023
Tokyo Institute of Technology
2023
Administration for Community Living
2023
Moscow Institute of Thermal Technology
2023
Visual question answering is fundamentally compositional in nature-a like where the dog? shares substructure with questions what color and cat? This paper seeks to simultaneously exploit representational capacity of deep networks linguistic structure questions. We describe a procedure for constructing learning neural module networks, which compose collections jointly-trained "modules" into answering. Our approach decomposes their substructures, uses these structures dynamically instantiate...
Natural language questions are inherently compositional, and many most easily answered by reasoning about their decomposition into modular sub-problems. For example, to answer “is there an equal number of balls boxes?” we can look for balls, boxes, count them, compare the results. The recently proposed Neural Module Network (NMN) architecture [3, 2] implements this approach question answering parsing linguistic substructures assembling question-specific deep networks from smaller modules...
Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Dan Klein. Proceedings of the 2016 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2016.
People often refer to entities in an image terms of their relationships with other entities. For example, the black cat sitting under table refers both a entity and its relationship another entity. Understanding these is essential for interpreting grounding such natural language expressions. Most prior work focuses on either entire referential expressions holistically one region, or localizing based fixed set categories. In this paper we instead present modular deep architecture capable...
Navigation guided by natural language instructions presents a challenging reasoning problem for instruction followers. Natural typically identify only few high-level decisions and landmarks rather than complete low-level motor behaviors; much of the missing information must be inferred based on perceptual context. In machine learning settings, this is doubly challenging: it difficult to collect enough annotated data enable process from scratch, also implement using generic sequence models....
In this work, we present a minimal neural model for constituency parsing based on independent scoring of labels and spans. We show that is not only compatible with classical dynamic programming techniques, but also admits novel greedy top-down inference algorithm recursive partitioning the input. demonstrate empirically both prediction schemes are competitive recent when combined basic extensions to capable achieving state-of-the-art single-model performance Penn Treebank (91.79 F1) strong...
Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, Joseph Turian. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020.
We propose a simple data augmentation protocol aimed at providing compositional inductive bias in conditional and unconditional sequence models. Under this protocol, synthetic training examples are constructed by taking real replacing (possibly discontinuous) fragments with other that appear least one similar environment. The is model-agnostic useful for variety of tasks. Applied to neural sequence-to-sequence models, it reduces error rate as much 87% on diagnostic tasks from the SCAN...
We present a model for contrastively describing scenes, in which context-specific behavior results from combination of inferencedriven pragmatics and learned semantics.Like previous approaches to language generation, our uses simple featuredriven architecture (here pair neural "listener" "speaker" models) ground the world.Like inference-driven pragmatics, actively reasons about listener when selecting utterances.For training, approach requires only ordinary captions, annotated without...
To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of world, learn transfer it task at hand. Recent advances representation learning for language make possible build models that acquire world knowledge from text corpora integrate this into downstream decision making problems. We thus argue time is right investigate a tight integration natural understanding RL particular. survey state field, including...
We describe a framework for multitask deep reinforcement learning guided by policy sketches. Sketches annotate tasks with sequences of named subtasks, providing information about high-level structural relationships among but not how to implement them---specifically the detailed guidance used much previous work on abstractions RL (e.g. intermediate rewards, subtask completion signals, or intrinsic motivations). To learn from sketches, we present model that associates every modular subpolicy,...
Natural language questions are inherently compositional, and many most easily answered by reasoning about their decomposition into modular sub-problems. For example, to answer "is there an equal number of balls boxes?" we can look for balls, boxes, count them, compare the results. The recently proposed Neural Module Network (NMN) architecture implements this approach question answering parsing linguistic substructures assembling question-specific deep networks from smaller modules that each...
We describe a question answering model that applies to both images and structured knowledge bases. The uses natural language strings automatically assemble neural networks from collection of composable modules. Parameters for these modules are learned jointly with network-assembly parameters via reinforcement learning, only (world, question, answer) triples as supervision. Our approach, which we term dynamic network, achieves state-of-the-art results on benchmark datasets in visual domains.
We describe an approach to task-oriented dialogue in which state is represented as a dataflow graph. A agent maps each user utterance program that extends this Programs include metacomputation operators for reference and revision reuse fragments from previous turns. Our graph-based enables the expression manipulation of complex intents, explicit makes these intents easier learned models predict. introduce new dataset, SMCalFlow, featuring dialogues about events, weather, places, people....
Belinda Z. Li, Maxwell Nye, Jacob Andreas. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.