- Human Pose and Action Recognition
- Speech and dialogue systems
- Hand Gesture Recognition Systems
- Anomaly Detection Techniques and Applications
- Video Surveillance and Tracking Methods
- Advanced Vision and Imaging
- Social Robot Interaction and HRI
- Advanced Neural Network Applications
- Innovative Human-Technology Interaction
- Gait Recognition and Analysis
- Multimodal Machine Learning Applications
- Video Analysis and Summarization
- Digital Games and Media
- Human Motion and Animation
- Advanced Image Processing Techniques
- Deception detection and forensic psychology
- Visual Attention and Saliency Detection
- Advanced Image and Video Retrieval Techniques
- Educational Games and Gamification
- Domain Adaptation and Few-Shot Learning
- Emotion and Mood Recognition
- Action Observation and Synchronization
- Interactive and Immersive Displays
- Gaze Tracking and Assistive Technology
- Psychopathy, Forensic Psychiatry, Sexual Offending
Netherlands Organisation for Applied Scientific Research
2025
Utrecht University
2015-2024
University of Twente
2006-2017
Arizona State University
2016
Lancaster University
2013
Carnegie Mellon University
2013
Microsoft Research (United Kingdom)
2013
University of Duisburg-Essen
2013
Human Media
2004-2006
Convolutional Neural Networks (CNNs) use pooling to decrease the size of activation maps. This process is crucial increase receptive fields and reduce computational requirements subsequent convolutions. An important feature operation minimization information loss, with respect initial maps, without a significant impact on computation memory overhead. To meet these requirements, we propose SoftPool: fast efficient method for exponentially weighted downsampling. Through experiments across...
Light field imaging presents an attractive alternative to RGB because of the recording direction incoming light. The detection salient regions in a light image benefits from additional modeling angular patterns. For imaging, methods using CNNs have achieved excellent results on range tasks, including saliency detection. However, it is not trivial use CNN-based for images these are specifically designed processing inputs. In addition, current datasets sufficiently large train CNNs. To...
Greenness in the urban living environment is inconsistently associated with mental health. Satellite-derived measures of greenness may inadequately characterize how people encounter visually on site, but systematic comparisons are lacking. We aimed 1) to compare associations between remotely sensed and street view (SV) greenness, 2) examine whether these metrics differently health outcomes. used cross-sectional depressive anxiety symptoms data adults Amsterdam, Netherlands. employed a...
Pooling layers are essential building blocks of convolutional neural networks (CNNs), to reduce computational overhead and increase the receptive fields proceeding operations. Their goal is produce downsampled volumes that closely resemble input volume while, ideally, also being computationally memory efficient. Meeting both these requirements remains a challenge. To this end, we propose an adaptive exponentially weighted pooling method: adaPool. Our method learns regional-specific fusion...
Few-shot instance segmentation methods are promising when labeled training data for novel classes is scarce. However, current approaches do not facilitate flexible addition of classes. They also require that examples each class provided at train and test time, which memory intensive. In this paper, we address these limitations by presenting the first incremental approach to few-shot segmentation: iMTFA. We learn discriminative embeddings object instances merged into representatives. Storing...
In our daily life everything and everyone occupies an amount of space, simply by "being there". Edward Hall coined the term proxemics for studies man's use this space. This paper presents a study on in Human-Robot Interaction particularly robot's approaching groups people. As social psychology research found to be culturally dependent, we focus question appropriateness approach behavior different cultures. We present online survey (N=181) that was distributed three countries; China, U.S....
Matching objects across partially overlapping camera views is crucial in multi-camera systems and requires a view-invariant feature extraction network. Training such network with cycle-consistency circumvents the need for labor-intensive labeling. In this paper, we extend mathematical formulation of to handle partial overlap. We then introduce pseudo-mask which directs training loss take overlap into account. additionally present several new cycle variants that complement each other...
For an artifact such as a robot or virtual agent to respond appropriately human social touch behavior, it should be able automatically detect and recognize touch. This paper describes the data collection of CoST: Corpus Social Touch, set containing 7805 captures 14 different gestures. All gestures were performed in three variants: gentle, normal rough on pressure sensor grid wrapped around mannequin arm. Recognition these gesture classes using various classifiers yielded accuracies up 60 %;...
Touch behavior is of great importance during social interaction. To transfer the tactile modality from interpersonal interaction to other areas such as Human-Robot Interaction (HRI) and remote communication automatic recognition touch necessary. This paper introduces CoST: Corpus Social Touch, a collection containing 7805 instances 14 different gestures. The gestures were performed in three variations: gentle, normal rough, on sensor grid wrapped around mannequin arm. Recognition rough...
In this work we employ multitask learning to capitalize on the structure that exists in related supervised tasks train complex neural networks. It allows training a network for multiple objectives parallel, order improve performance at least one of them by capitalizing shared representation is developed accommodate more information than it otherwise would single task. We idea tackle action recognition egocentric videos introducing additional tasks. consider verbs and nouns from which labels...
Egocentric vision is an emerging field of computer that characterized by the acquisition images and video from first person perspective. In this paper we address challenge egocentric human action recognition utilizing presence position detected regions interest in scene explicitly, without further use visual features. Initially, recognize hands are essential execution actions focus on obtaining their movements as principal cues define actions. We employ object detection region tracking...
We manually designed rules for a backchannel (BC) prediction model based on pitch and pause information.In short, the predicts BC when there is of certain length that preceded by falling or rising pitch.This was validated against Dutch IFADV Corpus in corpus-based evaluation method.The results showed our performs slightly better than another well-known rule-based uses only information.We observed preceding one important features this model, next to duration slope at end an utterance.Further,...