- Visual Attention and Saliency Detection
- Domain Adaptation and Few-Shot Learning
- Advanced Image and Video Retrieval Techniques
- Visual perception and processing mechanisms
- Neural dynamics and brain function
- Face Recognition and Perception
- Advanced Neural Network Applications
- Music and Audio Processing
- Emotion and Mood Recognition
- Image Retrieval and Classification Techniques
- Advanced Vision and Imaging
- Human Pose and Action Recognition
- Medical Image Segmentation Techniques
- Multimodal Machine Learning Applications
- Cell Image Analysis Techniques
- Speech and Audio Processing
- Anomaly Detection Techniques and Applications
- Video Analysis and Summarization
- Adversarial Robustness in Machine Learning
- EEG and Brain-Computer Interfaces
- Media Influence and Health
- Video Surveillance and Tracking Methods
- Generative Adversarial Networks and Image Synthesis
- Topic Modeling
- Neuroscience and Music Perception
Goethe University Frankfurt
2020-2025
Hessian Agency for Nature Conservation, Environment and Geology
2023-2024
Hessian Center for Artificial Intelligence
2023
Freie Universität Berlin
2020
Singapore University of Technology and Design
2017-2019
Massachusetts Institute of Technology
2014-2019
MIT Art, Design and Technology University
2019
McGovern Institute for Brain Research
2017
ETH Zurich
2011-2016
Universitat Ramon Llull
2011
Deep learning techniques have become the to-go models for most vision-related tasks on 2D images. However, their power has not been fully realised several in 3D space, e.g., scene understanding. In this work, we jointly address problems of semantic and instance segmentation point clouds. Specifically, develop a multi-task pointwise network that simultaneously performs two tasks: predicting classes points embedding into high-dimensional vectors so same object are represented by similar...
We show that adversarial examples, i.e., the visually imperceptible perturbations result in Convolutional Neural Networks (CNNs) fail, can be alleviated with a mechanism based on foveations---applying CNN different image regions. To see this, first, we report results ImageNet lead to revision of hypothesis are consequence CNNs acting as linear classifier: act locally linearly changes regions objects recognized by CNN, and other may non-linearly. Then, corroborate when neural responses...
Recently, the introduction of generative adversarial network (GAN) and its variants has enabled generation realistic synthetic samples, which been used for enlarging training sets. Previous work primarily focused on data augmentation semi-supervised supervised tasks. In this paper, we instead focus unsupervised anomaly detection propose a novel framework optimized task. By using GAN variant known as autoencoder (AAE), impose distribution latent space dataset systematically sample to generate...
Transfer learning is widely used in deep neural network models when there are few labeled examples available. The common approach to take a pre-trained similar task and finetune the model parameters. This usually done blindly without pre-selection from set of models, or by finetuning trained on different tasks selecting best performing one cross-validation. We address this problem proposing an assess relationship between visual their task-specific models. Our method uses Representation...
The ethical and societal implications of artificial intelligence systems raise concerns. In this article, we outline a novel process based on applied ethics, namely, Z-Inspection <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">®</sup> , to assess if an AI system is trustworthy. We use the definition trustworthy given by high-level European Commission's expert group AI. general inspection that can be variety domains where are used, such as...
With the rise in traffic congestion urban centers, predicting accidents has become paramount for city planning and public safety. This work comprehensively studied efficacy of modern deep learning (DL) methods forecasting enhancing Level-4 Level-5 (L-4 L-5) driving assistants with actionable visual language cues. Using a rich dataset detailing accident occurrences, we juxtaposed Transformer model against traditional time series models like ARIMA more recent Prophet model. Additionally,...
Super pixel and objectness algorithms are broadly used as a pre-processing step to generate support regions speed-up further computations. Recently, many have been extended video in order exploit the temporal consistency between frames. However, most methods computationally too expensive for real-time applications. We introduce an online, super algorithm based on recently proposed SEEDS pixels. A new capability is incorporated which delivers multiple diverse samples (hypotheses) of pixels...
In this work, we present a novel method to learn local cross-domain descriptor for 2D image and 3D point cloud matching. Our proposed is dual auto-encoder neural network that maps input into shared latent space representation. We show such descriptors in the embedding are more discriminative than those obtained from individual training domains. To facilitate process, built new dataset by collecting ≈ 1.4 millions of 2D-3D correspondences with various lighting conditions settings publicly...
The human visual cortex enables perception through a cascade of hierarchical computations in cortical regions with distinct functionalities. Here, we introduce an AI-driven approach to discover the functional mapping cortex. We related brain responses scene images measured MRI (fMRI) systematically diverse set deep neural networks (DNNs) optimized perform different tasks. found structured between DNN tasks and along ventral dorsal streams. Low-level mapped onto early regions, 3-dimensional...
Abstract To interact with objects in complex environments, we must know what they are and where spite of challenging viewing conditions. Here, investigated where, how when representations object location category emerge the human brain appear on cluttered natural scene images using a combination functional magnetic resonance imaging, electroencephalography computational models. We found to along ventral visual stream towards lateral occipital complex, mirrored by gradual emergence deep...
The increasing complexity of Multi-Agent Systems (MASs), coupled with the emergence Artificial Intelligence (AI) and Large Language Models (LLMs), have highlighted significant gaps in our understanding behavior interactions diverse entities within dynamic environments. Traditional game theory approaches often been employed this context, but their utility is limited by static homogenous nature models. With transformative influence AI LLMs on business society, a more nuanced theoretical...
Shortcut learning, i.e., a model's reliance on undesired features not directly relevant to the task, is major challenge that severely limits applications of machine learning algorithms, particularly when deploying them assist in making sensitive decisions, such as medical diagnostics. In this work, we leverage recent advancements create an unsupervised framework capable both detecting and mitigating shortcut transformers. We validate our method multiple datasets. Results demonstrate...
Most state-of-the-art visual attention models estimate the probability distribution of fixating eyes in a location image, so-called saliency maps. Yet, these do not predict temporal sequence eye fixations, which may be valuable for better predicting human as well understanding role different cues during exploration. In this paper, we present method is learned from recorded eye-tracking data. We use least-squares policy iteration (LSPI) to learn exploration that mimics eye-fixation examples....
This paper presents an exploration into the capabilities of adaptive PID controller within realm truck platooning operations, situating inquiry context Cognitive Radio and AI-enhanced 5G Beyond (B5G) networks. We developed a Deep Learning (DL) model that emulates controller, taking account implications factors such as communication latency, packet loss, range, alongside considerations reliability, robustness, security. Furthermore, we harnessed Large Language Model (LLM), GPT-3.5-turbo, to...
In this paper, we introduce an innovative approach to handling the multi-armed bandit (MAB) problem in non-stationary environments, harnessing predictive power of large language models (LLMs). With realization that traditional strategies, including epsilon-greedy and upper confidence bound (UCB), may struggle face dynamic changes, propose a strategy informed by LLMs offers guidance on exploration versus exploitation, contingent current state bandits. We bring forward new model with...
X-ray photoelectron spectroscopy (XPS) remains a fundamental technique in materials science, offering invaluable insights into the chemical states and electronic structure of material. However, interpretation XPS spectra can be complex, requiring deep expertise often sophisticated curve-fitting methods. In this study, we present novel approach to analysis data, integrating utilization large language models (LLMs), specifically OpenAI’s GPT-3.5/4 Turbo provide insightful guidance during data...
Recent self-supervised learning (SSL) models trained on human-like egocentric visual inputs substantially underperform image recognition tasks compared to humans. These train raw, uniform collected from head-mounted cameras. This is different humans, as the anatomical structure of retina and cortex relatively amplifies central information, i.e. around humans' gaze location. selective amplification in humans likely aids forming object-centered representations. Here, we investigate whether...
Scene perception is a key function of biological visual systems. According to the hierarchical processing view, scene in human brain begins with low-level features, progresses mid-level and ends high-level features. While low- feature well-studied, research on features remains limited. Here, we addressed this gap by investigating when are processed humans using novel stimulus set naturalistic scenes as images videos, accompanied ground-truth annotations for five (reflectance, lighting, world...
We formulate a model for multi-class object detection in multi-camera environment. From our knowledge, this is the first time that problem addressed taken into account different classes simultaneously. Given several images of scene from angles, system estimates ground plane location objects output detectors applied at each viewpoint. cast as an energy minimization modeled with Conditional Random Field (CRF). Instead predicting presence image independently, we simultaneously predict labeling...
In today’s complex economic environment, individuals and households alike grapple with the challenge of financial planning. This paper introduces novel methodologies for both individual cooperative (household) budgeting. We firstly propose an optimization framework budget allocation, aiming to maximize savings by efficiently distributing monthly income among various expense categories. then extend this model households, wherein complexity handling multiple incomes shared expenses is...
Abstract Studying the neural basis of human dynamic visual perception requires extensive experimental data to evaluate large swathes functionally diverse brain networks driven by perceiving events. Here, we introduce BOLD Moments Dataset (BMD), a repository whole-brain fMRI responses over 1000 short (3 s) naturalistic video clips events across ten subjects. We use videos’ metadata show how represents word- and sentence-level descriptions identify correlates memorability scores extending into...