- Advanced Image and Video Retrieval Techniques
- Domain Adaptation and Few-Shot Learning
- Multimodal Machine Learning Applications
- Generative Adversarial Networks and Image Synthesis
- Visual Attention and Saliency Detection
- Advanced Neural Network Applications
- Machine Learning and Data Classification
- Mobile Crowdsensing and Crowdsourcing
- Face recognition and analysis
- Face and Expression Recognition
- Aesthetic Perception and Analysis
- Human Pose and Action Recognition
- Biometric Identification and Security
- Data Stream Mining Techniques
- Computer Graphics and Visualization Techniques
- Image Retrieval and Classification Techniques
- Video Analysis and Summarization
- Anomaly Detection Techniques and Applications
- Advanced Image Processing Techniques
- Digital Media Forensic Detection
- Image and Video Quality Assessment
- Open Source Software Innovations
- Digital Imaging for Blood Diseases
- Neural Networks and Applications
- Speech and Audio Processing
Google (United States)
2019-2024
Menlo School
2022
Cornell University
2013-2021
Duquesne University
2020
Adobe Systems (United States)
2017
University of Colorado Colorado Springs
2011-2013
In this work we propose a novel interpretation of residual networks showing that they can be seen as collection many paths differing length. Moreover, seem to enable very deep by leveraging only the short during training. To support observation, rewrite an explicit paths. Unlike traditional models, through vary in Further, lesion study reveals these show ensemble-like behavior sense do not strongly depend on each other. Finally, and most surprising, are shorter than one might expect, needed...
Recent self-supervised representation learning techniques have largely closed the gap between supervised and unsupervised on ImageNet classification. While particulars of pretraining are now relatively well understood, field still lacks widely accepted best practices for replicating this success other datasets. As a first step in direction, we study contrastive four diverse large-scale By looking through lenses data quantity, domain, quality, task granularity, provide new insights into...
Computer vision systems are designed to work well within the context of everyday photography. However, artists often render world around them in ways that do not resemble photographs. Artwork produced by people is constrained mimic physical world, making it more challenging for machines recognize.,,This a step toward teaching how categorize images valuable humans. First, we collect large-scale dataset contemporary artwork from Behance, website containing millions portfolios professional and...
Recent progress in self-supervised learning has resulted models that are capable of extracting rich representations from image collections without requiring any explicit label supervision. However, to date the vast majority these approaches have restricted themselves training on standard benchmark datasets such as ImageNet. We argue fine-grained visual categorization problems, plant and animal species classification, provide an informative testbed for learning. In order facilitate this area...
We propose a novel measure of visual similarity for image retrieval that incorporates both structural and aesthetic (style) constraints. Our algorithm accepts query as sketched shape, set one or more contextual images specifying the desired aesthetic. A triplet network is used to learn feature embedding capable measuring style independent structure, delivering significant gains over previous networks discrimination. incorporate this model within hierarchical unify joint space from two...
Similarity comparisons of the form "Is object a more similar to b than c?" useful foundation in several computer vision and machine learning applications. Unfortunately, an embedding n points is only uniquely specified by n3 triplets, making collecting every triplet expensive task. In noticing this difficulty, other researchers investigated intelligent sampling techniques, but they do not study their effectiveness or potential drawbacks. Although it important reduce number collected triplets...
Dense prediction tasks, such as semantic segmentation, depth estimation, and surface normal prediction, can be easily formulated per-pixel classification (discrete outputs) or regression (continuous outputs). This paradigm has remained popular due to the prevalence of fully convolutional networks. However, on recent frontier segmentation task, community been witnessing a shift from cluster-prediction with emergence transformer architectures, particularly mask transformers, which directly...
Research experiences today are limited to a privileged few at select universities. Providing open access research would enable global upward mobility and increased diversity in the scientific workforce. How can we coordinate crowd of diverse volunteers on open-ended research? could PI have enough visibility into each person's contributions recommend them for further study? We present Crowd Research, crowdsourcing technique that coordinates through an iterative cycle contribution, synchronous...
After decades of study, automatic face detection and recognition systems are now accurate widespread. Naturally, this means users who wish to avoid becoming less able do so. Where we stand in cat-and-mouse race? We currently live a society where everyone carries camera their pocket. Many people willfully upload most or all the pictures they take social networks which invest heavily systems. In setting, is it still possible for privacy-conscientious recognition? If so, how? Must evasion...
The outreach of computer vision to non-traditional areas has enormous potential enable new ways solving real world problems. One such problem is how incorporate technology in the effort protect endangered and threatened species wild. This paper presents a snapshot our interdisciplinary team's ongoing work Mojave Desert build tools for field biologists study currently Tortoise Mohave Ground Squirrel. Animal population studies natural habitats present recognition challenges vision, where open...
This paper presents our work on "SNaCK," a low-dimensional concept embedding algorithm that combines human expertise with automatic machine similarity kernels. Both parts are complimentary: insight can capture relationships not apparent from the object's visual and help relieve having to exhaustively specify many constraints. We show SNaCK embeddings useful in several tasks: distinguishing prime nonprime numbers MNIST, discovering labeling mistakes Caltech UCSD Birds (CUB) dataset of...
Face and eye detection algorithms are deployed in a wide variety of applications. Unfortunately, there has been no quantitative comparison how these detectors perform under difficult circumstances. We created dataset low light long distance images which possess some the problems encountered by face solving real world problems. The we is composed reimaged (photohead) semi-synthetic heads imaged varying conditions light, atmospheric blur, distances 3m, 50m, 80m, 200m. This paper analyzes...
Standard training techniques for neural networks involve multiple sources of randomness, e.g., initialization, mini-batch ordering and in some cases data augmentation. Given that are heavily over-parameterized practice, such randomness can cause {\em churn} -- the same input, disagreements between predictions two models independently trained by algorithm, contributing to `reproducibility challenges' modern machine learning. In this paper, we study problem churn, identify factors it, propose...
When implementing real-world computer vision systems, researchers can use mid-level representations as a tool to adjust the trade-off between accuracy and efficiency. Unfortunately, existing that improve tend decrease efficiency, or are specifically tailored work well within one pipeline problem at exclusion of others. We introduce novel, efficient representation improves classification efficiency without sacrificing accuracy. Our Exemplar Codes based on linear classifiers probability...
We examine the possibility that recent promising results in automatic caption generation are due primarily to language models.By varying image representation quality produced by a convolutional neural network, we find state-of-theart captioning algorithm is able produce captions even when provided with surprisingly poor representations.We replicate this result new, fine-grained, transfer learned domain, consisting of 66K recipe image/title pairs.We also provide some experiments regarding...
As biometric authentication systems become common in everyday use, researchers are beginning to address privacy issues recognition. With the growing use of mobile devices, it is important develop approaches that support remote verification. This paper outlines need for a mobile/remote SVM-based system does not compromise subject being recognized. We discuss limitations earlier privacy-preserving and present necessary security requirements make attractive from both server's point view...
Blockchain, cryptographically linked blocks of data, is the key technology behind infamous cryptocurrency Bitcoin, however, blockchain can serve more use cases than just cryptocurrency. The has in any industry that generates and transfers data. In a case such as used to manage transactions from peer-to-peer way does not allow for transaction's data be manipulated by one involved parties or third party. This valuable process other industries could leverage. One benefit applying multitude...