NFDI4DS | UHH-SEMS - Publication Details

Andrei Barbu

ORCID: 0000-0001-7626-9266

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5018507577

Research Areas

Multimodal Machine Learning Applications
Natural Language Processing Techniques
Human Pose and Action Recognition
Topic Modeling
Video Analysis and Summarization
Advanced Image and Video Retrieval Techniques
AI-based Problem Solving and Planning
Reinforcement Learning in Robotics
Anomaly Detection Techniques and Applications
Neural Networks and Applications
Machine Learning and Algorithms
Speech and dialogue systems
Multi-Agent Systems and Negotiation
Neural dynamics and brain function
Robotic Path Planning Algorithms
Action Observation and Synchronization
Neurobiology of Language and Bilingualism
Domain Adaptation and Few-Shot Learning
Music and Audio Processing
Human Motion and Animation
Hand Gesture Recognition Systems
Subtitles and Audiovisual Media
Text Readability and Simplification
Visual Attention and Saliency Detection
Advanced Memory and Neural Computing

Companhia Brasileira de Metalurgia e Mineração (Brazil)
2024

Massachusetts Institute of Technology
2013-2023

IIT@MIT
2014-2022

Technion – Israel Institute of Technology
2022

Cornell University
2022

Vassar College
2019

Purdue University West Lafayette
2010-2018

Policijska akademija
2014

Alexandru Ioan Cuza University
2014

Police Academy
2014

Recognize Human Activities from Partially Observed Videos

OPENALEX - Publications

Yu Cao Daniel Paul Barrett Andrei Barbu N. Siddharth Haonan Yu and 5 more

Recognizing human activities in partially observed videos is a challenging problem and has many practical applications. When the unobserved subsequence at end of video, reduced to activity prediction from unfinished streaming, which been studied by researchers. However, general case, an may occur any time yielding temporal gap video. In this paper, we propose new method that can recognize case. Specifically, formulate into probabilistic framework: 1) dividing each multiple ordered segments,...

10.1109/cvpr.2013.343 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2013-06-01

Measuring Social Biases in Grounded Vision and Language Embeddings

OPENALEX - Publications

Candace Ross Boris Katz Andrei Barbu

We generalize the notion of measuring social biases in word embeddings to visually grounded embeddings. Biases are present embeddings, and indeed seem be equally or more significant than for ungrounded This is despite fact that vision language can suffer from different biases, which one might hope could attenuate both. Multiple ways exist metrics bias this new setting. introduce space generalizations (Grounded-WEAT Grounded-SEAT) demonstrate three answer yet important questions about how...

10.18653/v1/2021.naacl-main.78 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021-01-01

A Compositional Framework for Grounding Language Inference, Generation, and Acquisition in Video

OPENALEX - Publications

Haonan Yu N. Siddharth Andrei Barbu Jeffrey Mark Siskind

We present an approach to simultaneously reasoning about a video clip and entire natural-language sentence. The compositional nature of language is exploited construct models which represent the meanings sentences composed out words in those mediated by grammar that encodes predicate-argument relations. demonstrate these faithfully are sensitive how roles played participants (nouns), their characteristics (adjectives), actions performed (verbs), manner such (adverbs), changing spatial...

10.1613/jair.4556 article EN cc-by Journal of Artificial Intelligence Research 2015-04-30

Temporal Grounding Graphs for Language Understanding with Accrued Visual-Linguistic Context

OPENALEX - Publications

Rohan Paul Andrei Barbu Sue Felshin Boris Katz Nicholas Roy

A robot’s ability to understand or ground natural language instructions is fundamentally tied its knowledge about the surrounding world. We present an approach grounding utterances in context of factual information gathered through natural-language interactions and past visual observations. probabilistic model estimates, from a utterance, objects, relations, actions that utterance refers to, objectives for future robotic it implies, generates plan execute those while updating state...

10.24963/ijcai.2017/629 article EN 2017-07-28

Anchoring and Agreement in Syntactic Annotations

OPENALEX - Publications

Yevgeni Berzak Yan Huang Andrei Barbu Anna Korhonen Boris Katz

We present a study on two key characteristics of human syntactic annotations: anchoring and agreement. Anchoring is well known cognitive bias in decision making, where judgments are drawn towards pre-existing values. the influence standard approach to creation resources annotations obtained via editing tagger parser output. Our experiments demonstrate clear effect reveal unwanted consequences, including overestimation parsing performance lower quality comparison with human-based annotations....

10.18653/v1/d16-1239 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2016-01-01

Seeing What You're Told: Sentence-Guided Activity Recognition in Video

OPENALEX - Publications

N. Siddharth Andrei Barbu Jeffrey Mark Siskind

We present a system that demonstrates how the compositional structure of events, in concert with language, can interplay underlying focusing mechanisms video action recognition, providing medium for top-down and bottom-up integration as well multi-modal between vision language. show roles played by participants (nouns), their characteristics (adjectives), actions performed (verbs), manner such (adverbs), changing spatial relations (prepositions), form whole-sentence descriptions mediated...

10.1109/cvpr.2014.99 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2014-06-01

Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

OPENALEX - Publications

Yevgeni Berzak Andrei Barbu Daniel Harari Boris Katz Shimon Ullman

Understanding language goes hand in with the ability to integrate complex contextual information obtained via perception.In this work, we present a novel task for grounded understanding: disambiguating sentence given visual scene which depicts one of possible interpretations that sentence.To end, introduce new multimodal corpus containing ambiguous sentences, representing wide range syntactic, semantic and discourse ambiguities, coupled videos visualize different each sentence.We address by...

10.18653/v1/d15-1172 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2015-01-01

Deep compositional robotic planners that follow natural language commands

OPENALEX - Publications

Yen‐Ling Kuo Boris Katz Andrei Barbu

We demonstrate how a sampling-based robotic planner can be augmented to learn understand sequence of natural language commands in continuous configuration space move and manipulate objects. Our approach combines deep network structured according the parse complex command that includes objects, verbs, spatial relations, attributes, with planner, RRT. A recurrent hierarchical controls explores environment, determines when planned path is likely achieve goal, estimates confidence each trade off...

10.1109/icra40945.2020.9197464 article EN 2020-05-01

PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception

OPENALEX - Publications

Aviv Netanyahu Tianmin Shu Boris Katz Andrei Barbu Joshua B. Tenenbaum

The ability to perceive and reason about social interactions in the context of physical environments is core human intelligence human-machine cooperation. However, no prior dataset or benchmark has systematically evaluated physically grounded perception complex that go beyond short actions, such as high-fiving, simple group activities, gathering. In this work, we create a physically-grounded abstract events, PHASE, resemble wide range real-life by including concepts helping another agent....

10.1609/aaai.v35i1.16167 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

BrainBERT: Self-supervised representation learning for intracranial recordings

OPENALEX - Publications

Christopher Wang Vighnesh Subramaniam Adam Yaari Gabriel Kreiman Boris Katz and 2 more

We create a reusable Transformer, BrainBERT, for intracranial recordings bringing modern representation learning approaches to neuroscience. Much like in NLP and speech recognition, this Transformer enables classifying complex concepts, i.e., decoding neural data, with higher accuracy much less data by being pretrained an unsupervised manner on large corpus of unannotated recordings. Our approach generalizes new subjects electrodes positions unrelated tasks showing that the representations...

10.48550/arxiv.2302.14367 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Learning physically-instantiated game play through visual observation

OPENALEX - Publications

Andrei Barbu N. Siddharth Jeffrey Mark Siskind

We present an integrated vision and robotic system that plays, learns to play, simple physically-instantiated board games are variants of TIC TAC TOE HEXA-PAWN. employ novel custom hardware designed specifically for this learning task. The game rules can be parametrically specified. Two independent computational agents alternate playing the two opponents with shared hardware, using pre-specified rule sets. A third agent, sharing same solely by observing physical without access set, inductive...

10.1109/robot.2010.5509925 article EN 2010-05-01

Deep Sequential Models for Sampling-Based Planning

OPENALEX - Publications

Yen‐Ling Kuo Andrei Barbu Boris Katz

We demonstrate how a sequence model and sampling-based planner can influence each other to produce efficient plans such automatically learn take advantage of observations the environment. Sampling-based planners as RRT generally know nothing their environments even if they have traversed similar spaces many times. A model, an HMM or LSTM, guides search for good paths. The resulting called DeRRT*, observes state local environment bias next move state. neural-network-based models avoid manual...

10.1109/iros.2018.8593947 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018-10-01

Grounding language acquisition by training semantic parsers using captioned videos

OPENALEX - Publications

Candace Ross Andrei Barbu Yevgeni Berzak Battushig Myanganbayar Boris Katz

We develop a semantic parser that is trained in grounded setting using pairs of videos captioned with sentences. This both data-efficient, requiring little annotation, and similar to the experience children where they observe their environment listen speakers. The recovers meaning English sentences despite not having access any annotated It does so ambiguity inherent vision sentence may refer combination objects, object properties, relations or actions taken by agent video. For this task, we...

10.18653/v1/d18-1285 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

Trajectory Prediction with Linguistic Representations

OPENALEX - Publications

Yen‐Ling Kuo Xin Huang Andrei Barbu Stephen G. McGill Boris Katz and 2 more

Language allows humans to build mental models that interpret what is happening around them resulting in more accurate long-term predictions. We present a novel trajectory prediction model uses linguistic intermediate representations forecast trajectories, and trained using samples with partially-annotated captions. The learns the meaning of each words without direct per-word supervision. At inference time, it generates description trajectories which captures maneuvers interactions over an...

10.1109/icra46639.2022.9811928 article EN 2022 International Conference on Robotics and Automation (ICRA) 2022-05-23

Saying What You're Looking For: Linguistics Meets Video Search

OPENALEX - Publications

Daniel Paul Barrett Andrei Barbu N. Siddharth Jeffrey Mark Siskind

We present an approach to searching large video corpora for clips which depict a natural-language query in the form of sentence. Compositional semantics is used encode subtle meaning differences lost other approaches, such as difference between two sentences have identical words but entirely different meaning: The person rode horse versus person. Given sentential and parser, we produce score indicating how well clip depicts that sentence each corpus return ranked list clips. Two fundamental...

10.1109/tpami.2015.2505297 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2015-12-03

Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas

OPENALEX - Publications

Yen‐Ling Kuo Boris Katz Andrei Barbu

We demonstrate a reinforcement learning agent which uses compositional recurrent neural network that takes as input an LTL formula and determines satisfying actions. The formulas have never been seen before, yet the performs zero-shot generalization to satisfy them. This is novel form of multi-task for RL agents where learn from one diverse set tasks generalize new tasks. formulation enables this capacity generalize. ability in two domains. In symbolic domain, finds sequence letters...

10.1109/iros45743.2020.9341325 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020-10-24

Compositional Networks Enable Systematic Generalization for Grounded Language Understanding

OPENALEX - Publications

Yen‐Ling Kuo Boris Katz Andrei Barbu

Humans are remarkably flexible when understanding new sentences that include combinations of concepts they have never encountered before. Recent work has shown while deep networks can mimic some human language abilities presented with novel sentences, systematic variation uncovers the limitations in language-understanding networks. We demonstrate these be overcome by addressing generalization challenges gSCAN dataset, which explicitly measures how well an agent is able to interpret...

10.18653/v1/2021.findings-emnlp.21 preprint EN cc-by 2021-01-01

Brain Treebank: Large-scale intracranial recordings from naturalistic language stimuli

OPENALEX - Publications

Christopher Wang Adam Yaari Anjali Singh Vighnesh Subramaniam Dana Rosenfarb and 8 more

We present the Brain Treebank, a large-scale dataset of electrophysiological neural responses, recorded from intracranial probes while 10 subjects watched one or more Hollywood movies. Subjects on average 2.6 movies, for an viewing time 4.3 hours, and total 43 hours. The audio track each movie was transcribed with manual corrections. Word onsets were manually annotated spectrograms movie. Each transcript automatically parsed corrected into universal dependencies (UD) formalism, assigning...

10.48550/arxiv.2411.08343 preprint EN arXiv (Cornell University) 2024-11-13

Deep video-to-video transformations for accessibility with an application to photosensitivity

OPENALEX - Publications

Andrei Barbu Dalitso Banda Boris Katz

10.1016/j.patrec.2019.01.019 article EN Pattern Recognition Letters 2019-06-27

Coming Soon ...