- Hand Gesture Recognition Systems
- Human Pose and Action Recognition
- Hearing Impairment and Communication
- Gait Recognition and Analysis
- Multimodal Machine Learning Applications
- Generative Adversarial Networks and Image Synthesis
- Speech and dialogue systems
- Face recognition and analysis
- Bayesian Modeling and Causal Inference
- Natural Language Processing Techniques
- Neural Networks and Applications
- Statistical Methods in Clinical Trials
- Advanced Causal Inference Techniques
- Human Motion and Animation
- Anomaly Detection Techniques and Applications
- Digital Media Forensic Detection
- Explainable Artificial Intelligence (XAI)
- Adversarial Robustness in Machine Learning
- Interpreting and Communication in Healthcare
- Domain Adaptation and Few-Shot Learning
- Tactile and Sensory Interactions
- Urban Transport and Accessibility
- Computational Drug Discovery Methods
- Impact of Light on Environment and Health
- IoT-based Smart Home Systems
META Health
2023-2024
University of Surrey
2016-2023
Boğaziçi University
2014-2016
Yıldız Technical University
2012-2013
Sign Language Recognition (SLR) has been an active research field for the last two decades. However, most to date considered SLR as a naive gesture recognition problem. seeks recognize sequence of continuous signs but neglects underlying rich grammatical and linguistic structures sign language that differ from spoken language. In contrast, we introduce Translation (SLT) Here, objective is generate translations videos, taking into account different word orders grammar. We formalize SLT in...
Prior work on Sign Language Translation has shown that having a mid-level sign gloss representation (effectively recognizing the individual signs) improves translation performance drastically. In fact, current state-of-the-art in requires level tokenization order to work. We introduce novel transformer based architecture jointly learns Continuous Recognition and while being trainable an end-to-end manner. This is achieved by using Connectionist Temporal Classification (CTC) loss bind...
We propose a novel deep learning approach to solve simultaneous alignment and recognition problems (referred as "Sequence-to-sequence" learning). decompose the problem into series of specialised expert systems referred SubUNets. The spatio-temporal relationships between these SubUNets are then modelled task, while remaining trainable end-to-end. mimics human educational techniques, has number significant advantages. allow us inject domain-specific knowledge system regarding suitable...
In this work we present a new approach to the field of weakly supervised learning in video domain. Our method is relevant sequence problems which can be split up into sub-problems that occur parallel. Here, experiment with sign language data. The exploits constraints within each independent stream and combines them by explicitly imposing synchronisation points make use parallelism all share. We do multi-stream HMMs while adding intermediate among streams. embed powerful CNN-LSTM models HMM...
Abstract We present a novel approach to automatic Sign Language Production using recent developments in Neural Machine Translation (NMT), Generative Adversarial Networks, and motion generation. Our system is capable of producing sign videos from spoken language sentences. Contrary current approaches that are dependent on heavily annotated data, our requires minimal gloss skeletal level annotations for training. achieve this by breaking down the task into dedicated sub-processes. first...
In this paper, we propose using 3D Convolutional Neural Networks for large scale user-independent continuous gesture recognition. We have trained an end-to-end deep network recognition (jointly learning both the feature representation and classifier). The performs three-dimensional (i.e. space-time) convolutions to extract features related appearance motion from volumes of color frames. Space-time invariance extracted is encoded via pooling layers. earlier stages are partially initialized...
Abstract Sign languages are multi-channel visual languages, where signers use a continuous 3D space to communicate. language production (SLP), the automatic translation from spoken sign must embody both articulation and full morphology of be truly understandable by Deaf community. Previous deep learning-based SLP works have produced only concatenation isolated signs focusing primarily on manual features, leading robotic non-expressive production. In this work, we propose novel Progressive...
Sign languages are visual languages, with vocabularies as rich their spoken language counterparts. However, current deep-learning based Language Production (SLP) models produce under-articulated skeleton pose sequences from constrained and this limits applicability. To be understandable accepted by the deaf, an automatic SLP system must able to generate co-articulated photo-realistic signing for large domains of discourse. In work, we tackle large-scale learning co-articulate between...
Sign languages are visual produced by the movement of hands, face, and body. In this paper, we evaluate representations based on skeleton poses, as these explainable, person-independent, privacy-preserving, low-dimensional representations. Basically, skeletal generalize over an individual's appearance background, allowing us to focus recognition motion. But how much information is lost representation? We perform two independent studies using state-of-the-art pose estimation systems. analyze...
To be truly understandable and accepted by Deaf communities, an automatic Sign Language Production (SLP) system must generate a photo-realistic signer. Prior approaches based on graphical avatars have proven unpopular, whereas recent neural SLP works that produce skeleton pose sequences been shown to not viewers. In this paper, we propose SignGAN, the first model continuous sign language videos directly from spoken language. We employ transformer architecture with Mixture Density Network...
Computational sign language research lacks the large-scale datasets that enables creation of useful real-life applications. To date, most has been limited to prototype systems on small domains discourse, e.g. weather forecasts. address this issue and push field forward, we release six comprised 190 hours footage larger domain news. From this, 20 have annotated by Deaf experts interpreters is made publicly available for purposes. In paper, share dataset collection process tools developed...
Predicting 3D human pose from a single monoscopic video can be highly challenging due to factors such as low resolution, motion blur and occlusion, in addition the fundamental ambiguity estimating 2D. Approaches that directly regress independent images particularly susceptible these result jitter, noise and/or inconsistencies skeletal estimation. Much of which overcome if temporal evolution scene skeleton are taken into account. However, rather than tracking body parts trying temporally...
It is common practice to represent spoken languages at their phonetic level. However, for sign languages, this implies breaking motion into its constituent primitives. Avatar based Sign Language Production (SLP) has traditionally done just this, building up animation from sequences of hand motions, shapes and facial expressions. more recent deep learning solutions SLP have tackled the problem using a single network that estimates full skeletal structure.We propose splitting task two distinct...
Sign Languages are rich multi-channel languages, requiring articulation of both manual (hands) and non-manual (face body) features in a precise, intricate manner. Language Production (SLP), the automatic translation from spoken to sign must embody this full morphology be truly understandable by Deaf community. Previous work has mainly focused on feature production, with an under-articulated output caused regression mean. In paper, we propose Adversarial Multi-Channel approach SLP. We frame...
Mathias Müller, Malihe Alikhani, Eleftherios Avramidis, Richard Bowden, Annelies Braffort, Necati Cihan Camgöz, Sarah Ebling, Cristina España-Bonet, Anne Göhring, Roman Grundkiewicz, Mert Inan, Zifan Jiang, Oscar Koller, Amit Moryossef, Annette Rios, Dimitar Shterionov, Sandra Sidler-Miserez, Katja Tissi, Davy Van Landuyt. Proceedings of the Eighth Conference on Machine Translation. 2023.
AR/VR devices have started to adopt hand tracking, in lieu of controllers, support user interaction. However, today's input rely primarily on one gesture: pinch. Moreover, current mappings motion use cases like VR locomotion and content scrolling involve more complex larger arm motions than joystick or trackpad usage. STMG increases the gesture space by recognizing additional small thumb-based microgestures from skeletal tracking running a headset. We take machine learning approach achieve...