NFDI4DS | UHH-SEMS - Publication Details

Gemma Roig

ORCID: 0000-0002-6439-8076

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5025034643

Research Areas

Visual Attention and Saliency Detection
Domain Adaptation and Few-Shot Learning
Advanced Image and Video Retrieval Techniques
Visual perception and processing mechanisms
Neural dynamics and brain function
Face Recognition and Perception
Advanced Neural Network Applications
Music and Audio Processing
Emotion and Mood Recognition
Image Retrieval and Classification Techniques
Advanced Vision and Imaging
Human Pose and Action Recognition
Medical Image Segmentation Techniques
Multimodal Machine Learning Applications
Cell Image Analysis Techniques
Speech and Audio Processing
Anomaly Detection Techniques and Applications
Video Analysis and Summarization
Adversarial Robustness in Machine Learning
EEG and Brain-Computer Interfaces
Media Influence and Health
Video Surveillance and Tracking Methods
Generative Adversarial Networks and Image Synthesis
Topic Modeling
Neuroscience and Music Perception

Goethe University Frankfurt
2020-2025

Hessian Agency for Nature Conservation, Environment and Geology
2023-2024

Hessian Center for Artificial Intelligence
2023

Freie Universität Berlin
2020

Singapore University of Technology and Design
2017-2019

Massachusetts Institute of Technology
2014-2019

MIT Art, Design and Technology University
2019

McGovern Institute for Brain Research
2017

ETH Zurich
2011-2016

Universitat Ramon Llull
2011

JSIS3D: Joint Semantic-Instance Segmentation of 3D Point Clouds With Multi-Task Pointwise Networks and Multi-Value Conditional Random Fields

OPENALEX - Publications

Quang-Hieu Pham Thanh Huy Nguyen Binh‐Son Hua Gemma Roig Sai-Kit Yeung

Deep learning techniques have become the to-go models for most vision-related tasks on 2D images. However, their power has not been fully realised several in 3D space, e.g., scene understanding. In this work, we jointly address problems of semantic and instance segmentation point clouds. Specifically, develop a multi-task pointwise network that simultaneously performs two tasks: predicting classes points embedding into high-dimensional vectors so same object are represented by similar...

10.1109/cvpr.2019.00903 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

SEEDS: Superpixels Extracted Via Energy-Driven Sampling

OPENALEX - Publications

Michael Van den Bergh Xavier Boix Gemma Roig Luc Van Gool

10.1007/s11263-014-0744-2 article EN International Journal of Computer Vision 2014-07-18

Foveation-based Mechanisms Alleviate Adversarial Examples

OPENALEX - Publications

Yan Luo Xavier Boix Gemma Roig Tomaso Poggio Qi Zhao

We show that adversarial examples, i.e., the visually imperceptible perturbations result in Convolutional Neural Networks (CNNs) fail, can be alleviated with a mechanism based on foveations---applying CNN different image regions. To see this, first, we report results ImageNet lead to revision of hypothesis are consequence CNNs acting as linear classifier: act locally linearly changes regions objects recognized by CNN, and other may non-linearly. Then, corroborate when neural responses...

10.48550/arxiv.1511.06292 preprint EN other-oa arXiv (Cornell University) 2015-01-01

DOPING: Generative Data Augmentation for Unsupervised Anomaly Detection with GAN

OPENALEX - Publications

Swee Kiat Lim Yi Loo Ngoc-Trung Tran Ngai‐Man Cheung Gemma Roig and 1 more

Recently, the introduction of generative adversarial network (GAN) and its variants has enabled generation realistic synthetic samples, which been used for enlarging training sets. Previous work primarily focused on data augmentation semi-supervised supervised tasks. In this paper, we instead focus unsupervised anomaly detection propose a novel framework optimized task. By using GAN variant known as autoencoder (AAE), impose distribution latent space dataset systematically sample to generate...

10.1109/icdm.2018.00146 article EN 2021 IEEE International Conference on Data Mining (ICDM) 2018-11-01

Representation Similarity Analysis for Efficient Task Taxonomy & Transfer Learning

OPENALEX - Publications

Kshitij Dwivedi Gemma Roig

Transfer learning is widely used in deep neural network models when there are few labeled examples available. The common approach to take a pre-trained similar task and finetune the model parameters. This usually done blindly without pre-selection from set of models, or by finetuning trained on different tasks selecting best performing one cross-validation. We address this problem proposing an assess relationship between visual their task-specific models. Our method uses Representation...

10.1109/cvpr.2019.01267 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Z-Inspection®: A Process to Assess Trustworthy AI

OPENALEX - Publications

Roberto V. Zicari John Brodersen James Brusseau Boris Düdder Timo Eichhorn and 12 more

The ethical and societal implications of artificial intelligence systems raise concerns. In this article, we outline a novel process based on applied ethics, namely, Z-Inspection <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">®</sup> , to assess if an AI system is trustworthy. We use the definition trustworthy given by high-level European Commission's expert group AI. general inspection that can be variety domains where are used, such as...

10.1109/tts.2021.3066209 article EN cc-by-nc-nd IEEE Transactions on Technology and Society 2021-03-17

LLM Multimodal Traffic Accident Forecasting

OPENALEX - Publications

I. de Zarzà J. de Curtò Gemma Roig Carlos T. Calafate

With the rise in traffic congestion urban centers, predicting accidents has become paramount for city planning and public safety. This work comprehensively studied efficacy of modern deep learning (DL) methods forecasting enhancing Level-4 Level-5 (L-4 L-5) driving assistants with actionable visual language cues. Using a rich dataset detailing accident occurrences, we juxtaposed Transformer model against traditional time series models like ARIMA more recent Prophet model. Additionally,...

10.3390/s23229225 article EN cc-by Sensors 2023-11-16

Online Video SEEDS for Temporal Window Objectness

OPENALEX - Publications

Michael Van den Bergh Gemma Roig Xavier Boix Santiago Manén Luc Van Gool

Super pixel and objectness algorithms are broadly used as a pre-processing step to generate support regions speed-up further computations. Recently, many have been extended video in order exploit the temporal consistency between frames. However, most methods computationally too expensive for real-time applications. We introduce an online, super algorithm based on recently proposed SEEDS pixels. A new capability is incorporated which delivers multiple diverse samples (hypotheses) of pixels...

10.1109/iccv.2013.54 article EN 2013-12-01

LCD: Learned Cross-Domain Descriptors for 2D-3D Matching

OPENALEX - Publications

Quang-Hieu Pham Mikaela Angelina Uy Binh‐Son Hua Duc Thanh Nguyen Gemma Roig and 1 more

In this work, we present a novel method to learn local cross-domain descriptor for 2D image and 3D point cloud matching. Our proposed is dual auto-encoder neural network that maps input into shared latent space representation. We show such descriptors in the embedding are more discriminative than those obtained from individual training domains. To facilitate process, built new dataset by collecting ≈ 1.4 millions of 2D-3D correspondences with various lighting conditions settings publicly...

10.1609/aaai.v34i07.6859 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Unveiling functions of the visual cortex using task-specific deep neural networks

OPENALEX - Publications

Kshitij Dwivedi Michael Bonner Radoslaw M. Cichy Gemma Roig

The human visual cortex enables perception through a cascade of hierarchical computations in cortical regions with distinct functionalities. Here, we introduce an AI-driven approach to discover the functional mapping cortex. We related brain responses scene images measured MRI (fMRI) systematically diverse set deep neural networks (DNNs) optimized perform different tasks. found structured between DNN tasks and along ventral dorsal streams. Low-level mapped onto early regions, 3-dimensional...

10.1371/journal.pcbi.1009267 article EN cc-by PLoS Computational Biology 2021-08-13

The spatiotemporal neural dynamics of object location representations in the human brain

OPENALEX - Publications

Monika Graumann Caterina Ciuffi Kshitij Dwivedi Gemma Roig Radoslaw M. Cichy

Abstract To interact with objects in complex environments, we must know what they are and where spite of challenging viewing conditions. Here, investigated where, how when representations object location category emerge the human brain appear on cluttered natural scene images using a combination functional magnetic resonance imaging, electroencephalography computational models. We found to along ventral visual stream towards lateral occipital complex, mirrored by gradual emergence deep...

10.1038/s41562-022-01302-0 article EN cc-by Nature Human Behaviour 2022-02-24

A large and rich EEG dataset for modeling human visual object recognition

OPENALEX - Publications

Alessandro T. Gifford Kshitij Dwivedi Gemma Roig Radoslaw M. Cichy

10.1016/j.neuroimage.2022.119754 article EN NeuroImage 2022-11-15

Emergent Cooperation and Strategy Adaptation in Multi-Agent Systems: An Extended Coevolutionary Theory with LLMs

OPENALEX - Publications

I. de Zarzà J. de Curtò Gemma Roig Pietro Manzoni Carlos T. Calafate

The increasing complexity of Multi-Agent Systems (MASs), coupled with the emergence Artificial Intelligence (AI) and Large Language Models (LLMs), have highlighted significant gaps in our understanding behavior interactions diverse entities within dynamic environments. Traditional game theory approaches often been employed this context, but their utility is limited by static homogenous nature models. With transformative influence AI LLMs on business society, a more nuanced theoretical...

10.3390/electronics12122722 article EN Electronics 2023-06-18

Efficient Unsupervised Shortcut Learning Detection and Mitigation in Transformers

OPENALEX - Publications

Lukas Kuhn Sari Saba-Sadiya Jörg Schlötterer Christin Seifert Gemma Roig

Shortcut learning, i.e., a model's reliance on undesired features not directly relevant to the task, is major challenge that severely limits applications of machine learning algorithms, particularly when deploying them assist in making sensitive decisions, such as medical diagnostics. In this work, we leverage recent advancements create an unsupervised framework capable both detecting and mitigating shortcut transformers. We validate our method multiple datasets. Results demonstrate...

10.48550/arxiv.2501.00942 preprint EN arXiv (Cornell University) 2025-01-01

UIDAPLE: Unsupervised Incremental Domain Adaptation through Adaptive Prompt Learning

OPENALEX - Publications

Samrat Mukherjee Tanuj Sur Saurish Seksaria Subhasis Chaudhuri Gemma Roig and 1 more

10.1109/icassp49660.2025.10890768 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Learning to Predict Sequences of Human Visual Fixations

OPENALEX - Publications

Ming Jiang Xavier Boix Gemma Roig Juan Xu Luc Van Gool and 1 more

Most state-of-the-art visual attention models estimate the probability distribution of fixating eyes in a location image, so-called saliency maps. Yet, these do not predict temporal sequence eye fixations, which may be valuable for better predicting human as well understanding role different cues during exploration. In this paper, we present method is learned from recorded eye-tracking data. We use least-squares policy iteration (LSPI) to learn exploration that mimics eye-fixation examples....

10.1109/tnnls.2015.2496306 article EN IEEE Transactions on Neural Networks and Learning Systems 2016-01-07

LLM Adaptive PID Control for B5G Truck Platooning Systems

OPENALEX - Publications

I. de Zarzà J. de Curtò Gemma Roig Carlos T. Calafate

This paper presents an exploration into the capabilities of adaptive PID controller within realm truck platooning operations, situating inquiry context Cognitive Radio and AI-enhanced 5G Beyond (B5G) networks. We developed a Deep Learning (DL) model that emulates controller, taking account implications factors such as communication latency, packet loss, range, alongside considerations reliability, robustness, security. Furthermore, we harnessed Large Language Model (LLM), GPT-3.5-turbo, to...

10.3390/s23135899 article EN cc-by Sensors 2023-06-25

LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments

OPENALEX - Publications

J. de Curtò I. de Zarzà Gemma Roig Juan‐Carlos Cano Pietro Manzoni and 1 more

In this paper, we introduce an innovative approach to handling the multi-armed bandit (MAB) problem in non-stationary environments, harnessing predictive power of large language models (LLMs). With realization that traditional strategies, including epsilon-greedy and upper confidence bound (UCB), may struggle face dynamic changes, propose a strategy informed by LLMs offers guidance on exploration versus exploitation, contingent current state bandits. We bring forward new model with...

10.3390/electronics12132814 article EN Electronics 2023-06-25

Large Language Model-Informed X-ray Photoelectron Spectroscopy Data Analysis

OPENALEX - Publications

J. de Curtò I. de Zarzà Gemma Roig Carlos T. Calafate

X-ray photoelectron spectroscopy (XPS) remains a fundamental technique in materials science, offering invaluable insights into the chemical states and electronic structure of material. However, interpretation XPS spectra can be complex, requiring deep expertise often sophisticated curve-fitting methods. In this study, we present novel approach to analysis data, integrating utilization large language models (LLMs), specifically OpenAI’s GPT-3.5/4 Turbo provide insightful guidance during data...

10.3390/signals5020010 article EN cc-by Signals 2024-03-27

Human Gaze Boosts Object-Centered Representation Learning

OPENALEX - Publications

Timothy Schaumlöffel Arthur Aubret Gemma Roig Jochen Triesch

Recent self-supervised learning (SSL) models trained on human-like egocentric visual inputs substantially underperform image recognition tasks compared to humans. These train raw, uniform collected from head-mounted cameras. This is different humans, as the anatomical structure of retina and cortex relatively amplifies central information, i.e. around humans' gaze location. selective amplification in humans likely aids forming object-centered representations. Here, we investigate whether...

10.48550/arxiv.2501.02966 preprint EN arXiv (Cornell University) 2025-01-06

Investigating the temporal dynamics and modelling of mid-level feature representations in humans

OPENALEX - Publications

Agnessa Karapetian Alexander Lenders Vanshika Bawa Martin Pflaum Raphael Leuner and 3 more

Scene perception is a key function of biological visual systems. According to the hierarchical processing view, scene in human brain begins with low-level features, progresses mid-level and ends high-level features. While low- feature well-studied, research on features remains limited. Here, we addressed this gap by investigating when are processed humans using novel stimulus set naturalistic scenes as images videos, accompanied ground-truth annotations for five (reflectance, lighting, world...

10.1101/2025.03.18.643889 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2025-03-18

Conditional Random Fields for multi-camera object detection

OPENALEX - Publications

Gemma Roig Xavier Boix Horesh Ben Shitrit Pascal Fua

We formulate a model for multi-class object detection in multi-camera environment. From our knowledge, this is the first time that problem addressed taken into account different classes simultaneously. Given several images of scene from angles, system estimates ground plane location objects output detectors applied at each viewpoint. cast as an energy minimization modeled with Conditional Random Field (CRF). Instead predicting presence image independently, we simultaneously predict labeling...

10.1109/iccv.2011.6126289 article EN International Conference on Computer Vision 2011-11-01

Optimized Financial Planning: Integrating Individual and Cooperative Budgeting Models with LLM Recommendations

OPENALEX - Publications

I. de Zarzà J. de Curtò Gemma Roig Carlos T. Calafate

In today’s complex economic environment, individuals and households alike grapple with the challenge of financial planning. This paper introduces novel methodologies for both individual cooperative (household) budgeting. We firstly propose an optimization framework budget allocation, aiming to maximize savings by efficiently distributing monthly income among various expense categories. then extend this model households, wherein complexity handling multiple incomes shared expenses is...

10.3390/ai5010006 article EN cc-by AI 2023-12-25

Modeling short visual events through the BOLD moments video fMRI dataset and metadata

OPENALEX - Publications

Benjamin Lahner Kshitij Dwivedi Polina Iamshchinina Monika Graumann Alex Lascelles and 8 more

Abstract Studying the neural basis of human dynamic visual perception requires extensive experimental data to evaluate large swathes functionally diverse brain networks driven by perceiving events. Here, we introduce BOLD Moments Dataset (BMD), a repository whole-brain fMRI responses over 1000 short (3 s) naturalistic video clips events across ten subjects. We use videos’ metadata show how represents word- and sentence-level descriptions identify correlates memorability scores extending into...

10.1038/s41467-024-50310-3 article EN cc-by Nature Communications 2024-07-24

Coming Soon ...