- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Advanced Image and Video Retrieval Techniques
- Advanced Vision and Imaging
- Multimodal Machine Learning Applications
- Human Pose and Action Recognition
- Face and Expression Recognition
- Face recognition and analysis
- Advanced Image Processing Techniques
- Natural Language Processing Techniques
- Hand Gesture Recognition Systems
- Topic Modeling
- Emotion and Mood Recognition
- Image Retrieval and Classification Techniques
- Gaze Tracking and Assistive Technology
- Image and Signal Denoising Methods
- Gait Recognition and Analysis
- Image Processing Techniques and Applications
- Advanced Computing and Algorithms
- Advanced Image Fusion Techniques
- Facial Nerve Paralysis Treatment and Research
- Medical Image Segmentation Techniques
- Speech and Audio Processing
- Visual Attention and Saliency Detection
- Digital Rights Management and Security
Apple (United Kingdom)
2024
Google (United States)
2018-2021
University of Maryland, College Park
2013-2017
Indian Institute of Technology Madras
2009
Recently introduced cost-effective depth sensors coupled with the real-time skeleton estimation algorithm of Shotton et al. [16] have generated a renewed interest in skeleton-based human action recognition. Most existing approaches use either joint locations or angles to represent skeleton. In this paper, we propose new skeletal representation that explicitly models 3D geometric relationships between various body parts using rotations and translations space. Since rigid motions are members...
Recent advances in video super-resolution have shown that convolutional neural networks combined with motion compensation are able to merge information from multiple low-resolution (LR) frames generate high-quality images. Current state-of-the-art methods process a batch of LR single high-resolution (HR) frame and run this scheme sliding window fashion over the entire video, effectively treating problem as large number separate multi-frame tasks. This approach has two main weaknesses: 1)...
Recently, skeleton-based human action recognition has been receiving significant attention from various research communities due to the availability of depth sensors and real-time depth-based 3D skeleton estimation algorithms. In this work, we use rolling maps for recognizing actions skeletal data. The map is a well-defined mathematical concept that not explored much by vision community. First, represent each using relative rotations between body parts. Since are members special orthogonal...
Collecting labeled data for the task of semantic segmentation is expensive and time-consuming, as it requires dense pixel-level annotations. While recent Convolutional Neural Network (CNN) based approaches have achieved impressive results by using large amounts training data, their performance drops significantly amount decreases. This happens because deep CNNs trained with de facto cross-entropy loss can easily overfit to small data. To address this issue, we propose a simple effective...
In contrast to the existing approaches that use discrete Conditional Random Field (CRF) models, we propose a Gaussian CRF model for task of semantic segmentation. We novel deep network, which refer as Mean (GMF) whose layers perform mean field inference over CRF. The proposed GMF network has desired property each its produces an output is closer maximum posteriori solution compared input. By combining with Convolutional Neural Networks (CNNs), new end-to-end trainable conditional random...
Most of the existing work on automatic facial expression analysis focuses discrete emotion recognition, or action unit detection. However, expressions do not always fall neatly into pre-defined semantic categories. Also, similarity between measured in space need correspond to how humans perceive similarity. Different from previous work, our goal is describe a continuous fashion using compact embedding that mimics human visual preferences. To achieve this goal, we collect large-scale...
In computer vision applications, features often lie on Riemannian manifolds with known geometry. Popular learning algorithms such as discriminant analysis, partial least squares, support vector machines, etc., are not directly applicable to due the non-Euclidean nature of underlying spaces. Hence, classification is performed in an extrinsic manner by mapping Euclidean spaces using kernels. However, for kernel based approaches, poor choice results reduced performance. this paper, we address...
Deep Convolutional Neural Networks (DCNN) have been proven to be effective for various computer vision problems. In this work, we demonstrate its effectiveness on a continuous object orientation estimation task, which requires prediction of 0 360 degrees the objects. We do so by proposing and comparing three approaches designed DCNNs. The first two work representing an as point unit circle minimizing either L2 loss or angular difference loss. third method works converting task into set...
We propose a novel end-to-end trainable deep network architecture for image denoising based on Gaussian Conditional Random Field (GCRF) model. In contrast to the existing discriminative methods that train separate model each individual noise level, proposed explicitly models input variance and hence is capable of handling range levels. Our network, which we refer as GCRF consists two sub-networks: (i) parameter generation generates pairwise potential parameters noisy image, (ii) an inference...
Standard Knowledge Distillation (KD) approaches distill the knowledge of a cumbersome teacher model into parameters student with pre-defined architecture. However, neural network, which is represented by network's output distribution conditioned on its input, depends not only but also Hence, more generalized approach for KD to teacher's both and architecture student. To achieve this, we present new \textit{Architecture-aware (AKD)} that finds models (pearls teacher) are best distilling given...
Recently, cross-modal synthesis of subject-specific scans has been receiving significant attention from the medical imaging community. Though various approaches have introduced in recent past, most them are either tailored to a specific application or proposed for supervised setting, i.e., they assume availability training data same set subjects both source and target modalities. But, collecting multiple each subject is undesirable. Hence, address this issue, we propose general unsupervised...
Over the past few years, symmetric positive definite (SPD) matrices have been receiving considerable attention from computer vision community. Though various distance measures proposed in for comparing SPD matrices, two most widely-used are affine-invariant and log-Euclidean distance. This is because these true geodesic distances induced by Riemannian geometry. In this work, we focus on geometry propose a data-driven approach learning metrics/geodesic matrices. We show that learned using...
Recently, a series of works in computer vision have shown promising results on various image and video understanding tasks using self-attention. However, due to the quadratic computational memory complexities self-attention, these either apply attention only low-resolution feature maps later stages deep network or restrict receptive field each layer small local region. To overcome limitations, this work introduces new global self-attention module, referred as GSA which is efficient enough...
The landscape of publicly available vision foundation models (VFMs), such as CLIP and Segment Anything Model (SAM), is expanding rapidly. VFMs are endowed with distinct capabilities stemming from their pre-training objectives. For instance, excels in semantic understanding, while SAM specializes spatial understanding for segmentation. In this work, we introduce a simple recipe to efficiently merge into unified model that absorbs expertise. Our method integrates techniques multi-task...
Background modeling and subtraction is a core component of many vision based systems. By far the most popular background models are per-pixel models, in which each pixel considered independently. Such fail to handle dynamic backgrounds noise. In this paper, we present solution problem by proposing novel computationally simple spatio-temporal model. We extend nonparametric model, one widely used from temporal domain domain. Instead individual pixels, consider 3 × blocks centered on use kernel...
Mutual gaze detection, i.e., predicting whether or not two people are looking at each other, plays an important role in understanding human interactions. In this work, we focus on the task of image-based mutual and propose a simple effective approach to boost performance by using auxiliary 3D estimation during training phase. We achieve without additional labeling cost branch pseudo labels deduced from labels. By sharing head image encoder between detection branches, better features than...
We propose a novel deep network architecture for image\\ denoising based on Gaussian Conditional Random Field (GCRF) model. In contrast to the existing discriminative methods that train separate model each noise level, proposed explicitly models input variance and hence is capable of handling range levels. Our network, which we refer as GCRF consists two sub-networks: (i) parameter generation generates pairwise potential parameters noisy image, (ii) an inference whose layers perform...