Andrei Barbu

ORCID: 0000-0001-7626-9266
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Natural Language Processing Techniques
  • Human Pose and Action Recognition
  • Topic Modeling
  • Video Analysis and Summarization
  • Advanced Image and Video Retrieval Techniques
  • AI-based Problem Solving and Planning
  • Reinforcement Learning in Robotics
  • Anomaly Detection Techniques and Applications
  • Neural Networks and Applications
  • Machine Learning and Algorithms
  • Speech and dialogue systems
  • Multi-Agent Systems and Negotiation
  • Neural dynamics and brain function
  • Robotic Path Planning Algorithms
  • Action Observation and Synchronization
  • Neurobiology of Language and Bilingualism
  • Domain Adaptation and Few-Shot Learning
  • Music and Audio Processing
  • Human Motion and Animation
  • Hand Gesture Recognition Systems
  • Subtitles and Audiovisual Media
  • Text Readability and Simplification
  • Visual Attention and Saliency Detection
  • Advanced Memory and Neural Computing

Companhia Brasileira de Metalurgia e Mineração (Brazil)
2024

Massachusetts Institute of Technology
2013-2023

IIT@MIT
2014-2022

Technion – Israel Institute of Technology
2022

Cornell University
2022

Vassar College
2019

Purdue University West Lafayette
2010-2018

Policijska akademija
2014

Alexandru Ioan Cuza University
2014

Police Academy
2014

Recognizing human activities in partially observed videos is a challenging problem and has many practical applications. When the unobserved subsequence at end of video, reduced to activity prediction from unfinished streaming, which been studied by researchers. However, general case, an may occur any time yielding temporal gap video. In this paper, we propose new method that can recognize case. Specifically, formulate into probabilistic framework: 1) dividing each multiple ordered segments,...

10.1109/cvpr.2013.343 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2013-06-01

We generalize the notion of measuring social biases in word embeddings to visually grounded embeddings. Biases are present embeddings, and indeed seem be equally or more significant than for ungrounded This is despite fact that vision language can suffer from different biases, which one might hope could attenuate both. Multiple ways exist metrics bias this new setting. introduce space generalizations (Grounded-WEAT Grounded-SEAT) demonstrate three answer yet important questions about how...

10.18653/v1/2021.naacl-main.78 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021-01-01

We present an approach to simultaneously reasoning about a video clip and entire natural-language sentence. The compositional nature of language is exploited construct models which represent the meanings sentences composed out words in those mediated by grammar that encodes predicate-argument relations. demonstrate these faithfully are sensitive how roles played participants (nouns), their characteristics (adjectives), actions performed (verbs), manner such (adverbs), changing spatial...

10.1613/jair.4556 article EN cc-by Journal of Artificial Intelligence Research 2015-04-30

A robot’s ability to understand or ground natural language instructions is fundamentally tied its knowledge about the surrounding world. We present an approach grounding utterances in context of factual information gathered through natural-language interactions and past visual observations. probabilistic model estimates, from a utterance, objects, relations, actions that utterance refers to, objectives for future robotic it implies, generates plan execute those while updating state...

10.24963/ijcai.2017/629 article EN 2017-07-28

We present a study on two key characteristics of human syntactic annotations: anchoring and agreement. Anchoring is well known cognitive bias in decision making, where judgments are drawn towards pre-existing values. the influence standard approach to creation resources annotations obtained via editing tagger parser output. Our experiments demonstrate clear effect reveal unwanted consequences, including overestimation parsing performance lower quality comparison with human-based annotations....

10.18653/v1/d16-1239 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2016-01-01

We present a system that demonstrates how the compositional structure of events, in concert with language, can interplay underlying focusing mechanisms video action recognition, providing medium for top-down and bottom-up integration as well multi-modal between vision language. show roles played by participants (nouns), their characteristics (adjectives), actions performed (verbs), manner such (adverbs), changing spatial relations (prepositions), form whole-sentence descriptions mediated...

10.1109/cvpr.2014.99 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2014-06-01

Understanding language goes hand in with the ability to integrate complex contextual information obtained via perception.In this work, we present a novel task for grounded understanding: disambiguating sentence given visual scene which depicts one of possible interpretations that sentence.To end, introduce new multimodal corpus containing ambiguous sentences, representing wide range syntactic, semantic and discourse ambiguities, coupled videos visualize different each sentence.We address by...

10.18653/v1/d15-1172 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2015-01-01

We demonstrate how a sampling-based robotic planner can be augmented to learn understand sequence of natural language commands in continuous configuration space move and manipulate objects. Our approach combines deep network structured according the parse complex command that includes objects, verbs, spatial relations, attributes, with planner, RRT. A recurrent hierarchical controls explores environment, determines when planned path is likely achieve goal, estimates confidence each trade off...

10.1109/icra40945.2020.9197464 article EN 2020-05-01

The ability to perceive and reason about social interactions in the context of physical environments is core human intelligence human-machine cooperation. However, no prior dataset or benchmark has systematically evaluated physically grounded perception complex that go beyond short actions, such as high-fiving, simple group activities, gathering. In this work, we create a physically-grounded abstract events, PHASE, resemble wide range real-life by including concepts helping another agent....

10.1609/aaai.v35i1.16167 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

We create a reusable Transformer, BrainBERT, for intracranial recordings bringing modern representation learning approaches to neuroscience. Much like in NLP and speech recognition, this Transformer enables classifying complex concepts, i.e., decoding neural data, with higher accuracy much less data by being pretrained an unsupervised manner on large corpus of unannotated recordings. Our approach generalizes new subjects electrodes positions unrelated tasks showing that the representations...

10.48550/arxiv.2302.14367 preprint EN other-oa arXiv (Cornell University) 2023-01-01

We present an integrated vision and robotic system that plays, learns to play, simple physically-instantiated board games are variants of TIC TAC TOE HEXA-PAWN. employ novel custom hardware designed specifically for this learning task. The game rules can be parametrically specified. Two independent computational agents alternate playing the two opponents with shared hardware, using pre-specified rule sets. A third agent, sharing same solely by observing physical without access set, inductive...

10.1109/robot.2010.5509925 article EN 2010-05-01

We demonstrate how a sequence model and sampling-based planner can influence each other to produce efficient plans such automatically learn take advantage of observations the environment. Sampling-based planners as RRT generally know nothing their environments even if they have traversed similar spaces many times. A model, an HMM or LSTM, guides search for good paths. The resulting called DeRRT*, observes state local environment bias next move state. neural-network-based models avoid manual...

10.1109/iros.2018.8593947 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018-10-01

We develop a semantic parser that is trained in grounded setting using pairs of videos captioned with sentences. This both data-efficient, requiring little annotation, and similar to the experience children where they observe their environment listen speakers. The recovers meaning English sentences despite not having access any annotated It does so ambiguity inherent vision sentence may refer combination objects, object properties, relations or actions taken by agent video. For this task, we...

10.18653/v1/d18-1285 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

Language allows humans to build mental models that interpret what is happening around them resulting in more accurate long-term predictions. We present a novel trajectory prediction model uses linguistic intermediate representations forecast trajectories, and trained using samples with partially-annotated captions. The learns the meaning of each words without direct per-word supervision. At inference time, it generates description trajectories which captures maneuvers interactions over an...

10.1109/icra46639.2022.9811928 article EN 2022 International Conference on Robotics and Automation (ICRA) 2022-05-23

We present an approach to searching large video corpora for clips which depict a natural-language query in the form of sentence. Compositional semantics is used encode subtle meaning differences lost other approaches, such as difference between two sentences have identical words but entirely different meaning: The person rode horse versus person. Given sentential and parser, we produce score indicating how well clip depicts that sentence each corpus return ranked list clips. Two fundamental...

10.1109/tpami.2015.2505297 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2015-12-03

We demonstrate a reinforcement learning agent which uses compositional recurrent neural network that takes as input an LTL formula and determines satisfying actions. The formulas have never been seen before, yet the performs zero-shot generalization to satisfy them. This is novel form of multi-task for RL agents where learn from one diverse set tasks generalize new tasks. formulation enables this capacity generalize. ability in two domains. In symbolic domain, finds sequence letters...

10.1109/iros45743.2020.9341325 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020-10-24

Humans are remarkably flexible when understanding new sentences that include combinations of concepts they have never encountered before. Recent work has shown while deep networks can mimic some human language abilities presented with novel sentences, systematic variation uncovers the limitations in language-understanding networks. We demonstrate these be overcome by addressing generalization challenges gSCAN dataset, which explicitly measures how well an agent is able to interpret...

10.18653/v1/2021.findings-emnlp.21 preprint EN cc-by 2021-01-01

We present the Brain Treebank, a large-scale dataset of electrophysiological neural responses, recorded from intracranial probes while 10 subjects watched one or more Hollywood movies. Subjects on average 2.6 movies, for an viewing time 4.3 hours, and total 43 hours. The audio track each movie was transcribed with manual corrections. Word onsets were manually annotated spectrograms movie. Each transcript automatically parsed corrected into universal dependencies (UD) formalism, assigning...

10.48550/arxiv.2411.08343 preprint EN arXiv (Cornell University) 2024-11-13
Coming Soon ...