- Advanced Image and Video Retrieval Techniques
- Advanced Vision and Imaging
- Face recognition and analysis
- Video Surveillance and Tracking Methods
- Domain Adaptation and Few-Shot Learning
- Face and Expression Recognition
- Image Retrieval and Classification Techniques
- Handwritten Text Recognition Techniques
- Robotics and Sensor-Based Localization
- Image Enhancement Techniques
- Multimodal Machine Learning Applications
- Robot Manipulation and Learning
- Statistical Methods and Inference
- Advanced Neural Network Applications
- Human Pose and Action Recognition
- Biometric Identification and Security
- Machine Learning and Algorithms
- 3D Surveying and Cultural Heritage
- Advanced Image Processing Techniques
- Generative Adversarial Networks and Image Synthesis
- Visual Attention and Saliency Detection
- Anomaly Detection Techniques and Applications
- Image Processing and 3D Reconstruction
- Medical Image Segmentation Techniques
- COVID-19 diagnosis using AI
University of Massachusetts Amherst
2015-2024
Amherst College
2007-2024
Mitsubishi Electric (Japan)
2023
Meta (Israel)
2020
University of Michigan
2017
University of California, Berkeley
2004
A longstanding question in computer vision concerns the representation of 3D shapes for recognition: should be represented with descriptors operating on their native formats, such as voxel grid or polygon mesh, can they effectively view-based descriptors? We address this context learning to recognize from a collection rendered views 2D images. first present standard CNN architecture trained shapes' independently each other, and show that shape recognized even single view at an accuracy far...
Given two consecutive frames, video interpolation aims at generating intermediate frame(s) to form both spatially and temporally coherent sequences. While most existing methods focus on single-frame interpolation, we propose an end-to-end convolutional neural network for variable-length multi-frame where the motion interpretation occlusion reasoning are jointly modeled. We start by computing bi-directional optical flow between input images using a U-Net architecture. These flows then...
While deep learning based methods for generic object detection have improved rapidly in the last two years, most approaches to face are still on R-CNN framework [11], leading limited accuracy and processing speed. In this paper, we investigate applying Faster RCNN [26], which has recently demonstrated impressive results various benchmarks, detection. By training a model large scale WIDER dataset [34], report state-of-the-art test set as well other widely used FDDB released IJB-A.
Visual tracking of general objects often relies on the assumption that gradient descent alignment function will reach global optimum. A common technique to smooth objective is blur image. However, blurring image destroys information, which can cause target be lost. To address this problem we introduce a method for building an descriptor using distribution fields (DFs), representation allows smoothing without destroying information about pixel values. We present experimental evidence...
Most modern face recognition systems rely on a feature representation given by hand-crafted image descriptor, such as Local Binary Patterns (LBP), and achieve improved performance combining several representations. In this paper, we propose deep learning natural source for obtaining additional, complementary To learn features in high-resolution images, make use of convolutional belief networks. Moreover, to take advantage global structure an object class, develop local restricted Boltzmann...
Many recognition algorithms depend on careful positioning of an object into a canonical pose, so the position features relative to fixed coordinate system can be examined. Currently, this is done either manually or by training class-specialized learning algorithm with samples class that have been hand-labeled parts poses. In paper, we describe novel method achieve using poorly aligned examples no additional labeling. Given set unaligned examplars class, such as faces, automatically build...
Popularized as `bottom-up' attention, bounding box (or region) based visual features have recently surpassed vanilla grid-based convolutional the de facto standard for vision and language tasks like question answering (VQA). However, it is not clear whether advantages of regions (e.g. better localization) are key reasons success bottom-up attention. In this paper, we revisit grid VQA, find they can work surprisingly well -- running more than an order magnitude faster with same accuracy if...
A longstanding question in computer vision concerns the representation of 3D shapes for recognition: should be represented with descriptors operating on their native formats, such as voxel grid or polygon mesh, can they effectively view-based descriptors? We address this context learning to recognize from a collection rendered views 2D images. first present standard CNN architecture trained shapes' independently each other, and show that shape recognized even single view at an accuracy far...
Convolutions are the fundamental building blocks of CNNs. The fact that their weights spatially shared is one main reasons for widespread use, but it also a major limitation, as makes convolutions content-agnostic. We propose pixel-adaptive convolution (PAC) operation, simple yet effective modification standard convolutions, in which filter multiplied with varying kernel depends on learnable, local pixel features. PAC generalization several popular filtering techniques and thus can be used...
This work addresses the unsupervised adaptation of an existing object detector to a new target domain. We assume that large number unlabeled videos from this domain are readily available. automatically obtain labels on data by using high-confidence detections detector, augmented with hard (misclassified) examples acquired exploiting temporal cues tracker. These automatically-obtained then used for re-training original model. A modified knowledge distillation loss is proposed, and we...
This paper presents a family of techniques that we call congealing for modeling image classes from data. The idea is to start with set images and make them appear as similar possible by removing variability along the known axes variation. technique can be used eliminate "nuisance" variables such affine deformations handwritten digits or unwanted bias fields magnetic resonance images. In addition separating latent images-i.e., without nuisance variables-we model themselves, leading factorized...
Conditional random fields (CRFs) provide powerful tools for building models to label image segments. They are particularly well-suited modeling local interactions among adjacent regions (e.g., super pixels). However, CRFs limited in dealing with complex, global (long-range) between regions. Complementary this, restricted Boltzmann machines (RBMs) can be used model shapes produced by segmentation models. In this work, we present a new that uses the combined power of these two network types...
Scene text recognition (STR) is the of anywhere in environment, such as signs and storefronts. Relative to document recognition, it challenging because font variability, minimal language context, uncontrolled conditions. Much information available solve this problem frequently ignored or used sequentially. Similarity between character images often overlooked useful information. Because priors, a recognizer may assign different labels identical characters. Directly comparing characters each...
Object recognition is a central problem in computer vision research. Most object systems have taken one of two approaches, using either global or local features exclusively. This may be part due to the difficulty combining single feature vector with set suitable manner. In this paper, we show that and beneficial an application where rough segmentations objects are available. We present method for classification non-parametric density estimation. Subsequently, methods features. The first uses...
Many classifiers are trained with massive training sets only to be applied at test time on data from a different distribution. How can we rapidly and simply adapt classifier new distribution, even when do not have access the original data? We present an on-line approach for adapting "black box" set without retraining or examining optimization criterion. Assuming outputs continuous number which threshold gives class, reclassify points near boundary using Gaussian process regression scheme....
The availability of massive data and computing power allowing for effective driven neural approaches is having a major impact on machine learning information retrieval research, but these models have basic problem with efficiency. Current ranking are implemented as multistage rankers: efficiency reasons, the model only re-ranks top ranked documents retrieved by first-stage efficient ranker in response to given query. Neural learn dense representations causing essentially every query term...
The recent explosive growth in convolutional neural network (CNN) research has produced a variety of new architectures for deep learning. One intriguing architecture is the bilinear CNN (B-CNN), which shown dramatic performance gains on certain fine-grained recognition problems [15]. We apply this to challenging face benchmark, IARPA Janus Benchmark A (IJB-A) [12]. It features faces from large number identities real-world conditions. Because images were not identified automatically using...
Self-paced learning and hard example mining re-weight training instances to improve accuracy. This paper presents two improved alternatives based on lightweight estimates of sample uncertainty in stochastic gradient descent (SGD): the variance predicted probability correct class across iterations mini-batch SGD, proximity decision threshold. Extensive experimental results six datasets show that our methods reliably accuracy various network architectures, including additional gains top other...
In moving camera videos, motion segmentation is commonly performed using the image plane of pixels, or optical flow. However, objects that are at different depths from can exhibit flows even if they share same real-world motion. This cause a depth-dependent scene. Our goal to develop algorithm clusters pixels have similar irrespective their depth in solution uses flow orientations instead complete vectors and exploits well-known property under translation, independent object depth. We...