- Advanced Image and Video Retrieval Techniques
- Advanced Vision and Imaging
- Image Retrieval and Classification Techniques
- Domain Adaptation and Few-Shot Learning
- Human Pose and Action Recognition
- Multimodal Machine Learning Applications
- Video Surveillance and Tracking Methods
- Image Processing Techniques and Applications
- Visual perception and processing mechanisms
- Advanced Neural Network Applications
- Neural dynamics and brain function
- Optical measurement and interference techniques
- Robotics and Sensor-Based Localization
- Face and Expression Recognition
- Neurobiology and Insect Physiology Research
- Machine Learning and Data Classification
- Anomaly Detection Techniques and Applications
- Face recognition and analysis
- Machine Learning and Algorithms
- Medical Image Segmentation Techniques
- Computer Graphics and Visualization Techniques
- Remote-Sensing Image Classification
- Visual Attention and Saliency Detection
- Face Recognition and Perception
- Species Distribution and Climate Change
California Institute of Technology
2015-2024
Amazon (United States)
2020-2024
Seattle University
2022
Amazon (Germany)
2020-2021
Stryker (United States)
2018
University of Edinburgh
2016
University of California, San Diego
2010
Pasadena City College
2010
Howard Hughes Medical Institute
2009
Institute of Electrical and Electronics Engineers
2006
A new definition of scale-space is suggested, and a class algorithms used to realize diffusion process introduced. The coefficient chosen vary spatially in such way as encourage intraregion smoothing rather than interregion smoothing. It shown that the 'no maxima should be generated at coarse scales' property conventional scale space preserved. As region boundaries approach remain sharp, high-quality edge detector which successfully exploits global information obtained. Experimental results...
We propose a novel approach to learn and recognize natural scene categories. Unlike previous work, it does not require experts annotate the training set. represent image of by collection local regions, denoted as codewords obtained unsupervised learning. Each region is represented part "theme". In such themes were learnt from hand-annotations experts, while our method learns theme distributions well distribution over without supervision. report satisfactory categorization performances on...
Pedestrian detection is a key problem in computer vision, with several applications that have the potential to positively impact quality of life. In recent years, number approaches detecting pedestrians monocular images has grown steadily. However, multiple data sets and widely varying evaluation protocols are used, making direct comparisons difficult. To address these shortcomings, we perform an extensive state art unified framework. We make three primary contributions: 1) put together...
Learning visual models of object categories notoriously requires hundreds or thousands training examples. We show that it is possible to learn much information about a category from just one, handful, images. The key insight that, rather than learning scratch, one can take advantage knowledge coming previously learned categories, no matter how different these might be. explore Bayesian implementation this idea. Object are represented by probabilistic models. Prior as probability density...
Current computational approaches to learning visual object categories require thousands of training images, are slow, cannot learn in an incremental manner and incorporate prior information into the process. In addition, no algorithm presented literature has been tested on more than a handful categories. We present method for from just few images. It is quick it uses principled way. test dataset composed images objects belonging 101 widely varied Our proposed based making use information,...
We present a method to learn and recognize object class models from unlabeled unsegmented cluttered scenes in scale invariant manner. Objects are modeled as flexible constellations of parts. A probabilistic representation is used for all aspects the object: shape, appearance, occlusion relative scale. An entropy-based feature detector select regions their within image. In learning parameters scale-invariant model estimated. This done using expectation-maximization maximum-likelihood setting....
Multi-resolution image features may be approximated via extrapolation from nearby scales, rather than being computed explicitly. This fundamental insight allows us to design object detection algorithms that are as accurate, and considerably faster, the state-of-the-art. The computational bottleneck of many modern detectors is computation at every scale a finely-sampled pyramid. Our key one compute finely sampled feature pyramids fraction cost, without sacrificing performance: for broad...
We present a new dataset with the goal of advancing state-of-the-art in object recognition by placing question context broader scene understanding. This is achieved gathering images complex everyday scenes containing common objects their natural context. Objects are labeled using per-instance segmentations to aid precise localization. Our contains photos 91 types that would be easily recognizable 4 year old. With total 2.5 million instances 328k images, creation our drew upon extensive crowd...
Pedestrian detection is a key problem in computer vision, with several applications including robotics, surveillance and automotive safety. Much of the progress past few years has been driven by availability challenging public datasets. To continue rapid rate innovation, we introduce Caltech Dataset, which two orders magnitude larger than existing The dataset contains richly annotated video, recorded from moving vehicle, images low resolution frequently occluded people. We propose improved...
We study the performance of 'integral channel features' for image classification tasks, focusing in particular on pedestrian detection.The general idea behind integral features is that multiple registered channels are computed using linear and non-linear transformations input image, then such as local sums, histograms, Haar their various generalizations efficiently images.Such have been used recent literature a variety tasks -indeed, variations appear to invented independently times.Although...
Existing image classification datasets used in computer vision tend to have a uniform distribution of images across object categories. In contrast, the natural world is heavily imbalanced, as some species are more abundant and easier photograph than others. To encourage further progress challenging real conditions we present iNaturalist detection dataset, consisting 859,000 from over 5,000 different plants animals. It features visually similar species, captured wide variety situations, all...
We present a model of human preattentive texture perception. This consists three stages: (1) convolution the image with bank even-symmetric linear filters followed by half-wave rectification to give set responses modeling outputs V1 simple cells, (2) inhibition, localized in space, within and among neural-response profiles that results suppression weak when there are strong at same or nearby locations, (3) texture-boundary detection using wide odd-symmetric mechanisms. Our can predict...
What can we see when do not pay attention? It is well known that be “blind” even to major aspects of natural scenes attend elsewhere. The only tasks need attention appear carried out in the early stages visual system. Contrary this common belief, report subjects rapidly detect animals or vehicles briefly presented novel while simultaneously performing another attentionally demanding task. By comparison, they are unable discriminate large T's from L's, bisected two-color disks their mirror...
Human faces captured in real-world conditions present large variations shape and occlusions due to differences pose, expression, use of accessories such as sunglasses hats interactions with objects (e.g. food). Current face landmark estimation approaches struggle under since they fail provide a principled way handling outliers. We propose novel method, called Robust Cascaded Pose Regression (RCPR) which reduces exposure outliers by detecting explicitly using robust shape-indexed features....
Current approaches to object category recognition require datasets of training images be manually prepared, with varying degrees supervision. We present an approach that can learn from just its name, by utilizing the raw output image search engines available on Internet. develop a new model, TSI-pLSA, which extends pLSA (as applied visual words) include spatial information in translation and scale invariant manner. Our handle high intra-class variability large proportion unrelated returned...
A key problem in learning multiple objects from unlabeled images is that it a priori impossible to tell which part of the image corresponds each individual object, and irrelevant clutter not associated objects. We investigate empirically what extent pure bottom-up attention can extract useful information about location, size shape demonstrate how this be utilized enable unsupervised images. Our experiments proposed approach using indeed for variety applications.