- Domain Adaptation and Few-Shot Learning
- Multimodal Machine Learning Applications
- Generative Adversarial Networks and Image Synthesis
- Face recognition and analysis
- Advanced Neural Network Applications
- Advanced Image and Video Retrieval Techniques
- Face and Expression Recognition
- Machine Learning and Data Classification
- Biometric Identification and Security
- Adversarial Robustness in Machine Learning
- COVID-19 diagnosis using AI
- Anomaly Detection Techniques and Applications
- Advanced Vision and Imaging
- Image Retrieval and Classification Techniques
- Computer Graphics and Visualization Techniques
- Digital Media Forensic Detection
- Video Surveillance and Tracking Methods
- Topic Modeling
- Speech Recognition and Synthesis
- Data-Driven Disease Surveillance
- Human Pose and Action Recognition
- Imbalanced Data Classification Techniques
- Privacy-Preserving Technologies in Data
- Human Motion and Animation
- Video Analysis and Summarization
Google (United States)
2019-2024
Korea Advanced Institute of Science and Technology
2021
NEC (United States)
2017-2020
NEC (Japan)
2015-2020
University of Michigan
2011-2015
Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance. In this paper, we demonstrate the power simple combination two common SSL methods: consistency regularization and pseudo-labeling. Our algorithm, FixMatch, first generates pseudo-labels using predictions on weakly-augmented images. For given image, pseudo-label is only retained if model produces high-confidence prediction. The then trained predict when fed...
Convolutional neural network-based approaches for semantic segmentation rely on supervision with pixel-level ground truth, but may not generalize well to unseen image domains. As the labeling process is tedious and labor intensive, developing algorithms that can adapt source truth labels target domain of great interest. In this paper, we propose an adversarial learning method adaptation in context segmentation. Considering segmentations as structured outputs contain spatial similarities...
We aim at constructing a high performance model for defect detection that detects unknown anomalous patterns of an image without data. To this end, we propose two-stage framework building anomaly detectors using normal training data only. first learn self-supervised deep representations and then build generative one-class classifier on learned representations. by classifying from the CutPaste, simple augmentation strategy cuts patch pastes random location large image. Our empirical study...
Despite recent advances in face recognition using deep learning, severe accuracy drops are observed for large pose variations unconstrained environments. Learning pose-invariant features is one solution, but needs expensively labeled large-scale data and carefully designed feature learning algorithms. In this work, we focus on frontalizing faces the wild under various head poses, including extreme profile view's. We propose a novel 3D Morphable Model (3DMM) conditioned Face Frontalization...
Predicting structured outputs such as semantic segmentation relies on expensive per-pixel annotations to learn supervised models like convolutional neural networks. However, trained one data domain may not generalize well other domains without for model finetuning. To avoid the labor-intensive process of annotation, we develop a adaptation method adapt source unlabeled target domain. We propose discriminative feature representations patches in by discovering multiple modes patch-wise output...
Despite the large volume of face recognition datasets, there is a significant portion subjects, which samples are insufficient and thus under-represented. Ignoring such results in training data. Training with under-represented data leads to biased classifiers conventionally-trained deep networks. In this paper, we propose center-based feature transfer framework augment space subjects from regular that have sufficiently diverse samples. A Gaussian prior variance assumed across all ones...
We improve the recently-proposed "MixMatch" semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring. Distribution encourages marginal of predictions on unlabeled data to be close ground-truth labels. Augmentation anchoring feeds multiple strongly augmented versions an input into model each output prediction for a weakly-augmented version same input. To produce strong augmentations, we propose variant AutoAugment which learns...
Semi-supervised learning (SSL) has a potential to improve the predictive performance of machine models using unlabeled data. Although there been remarkable recent progress, scope demonstration in SSL mainly on image classification tasks. In this paper, we propose STAC, simple yet effective framework for visual object detection along with data augmentation strategy. STAC deploys highly confident pseudo labels localized objects from an and updates model by enforcing consistency via strong...
Recently, convolutional neural networks (CNNs) have been used as a powerful tool to solve many problems of machine learning and computer vision. In this paper, we aim provide insight on the property networks, well generic method improve performance CNN architectures. Specifically, first examine existing models observe an intriguing that filters in lower layers form pairs (i.e., with opposite phase). Inspired by our observation, propose novel, simple yet effective activation scheme called...
Object detection systems based on the deep convolutional neural network (CNN) have recently made ground- breaking advances several object benchmarks. While features learned by these high-capacity networks are discriminative for categorization, inaccurate localization is still a major source of error detection. Building upon CNN architectures, we address problem 1) using search algorithm Bayesian optimization that sequentially proposes candidate regions an bounding box, and 2) training with...
Semi-supervised learning on class-imbalanced data, although a realistic problem, has been under studied. While existing semi-supervised (SSL) methods are known to perform poorly minority classes, we find that they still generate high precision pseudo-labels classes. By exploiting this property, in work, propose Class-Rebalancing Self-Training (CReST), simple yet effective framework improve SSL data. CReST iteratively retrains baseline model with labeled set expanded by adding pseudolabeled...
Despite the remarkable progress in deep generative models, synthesizing high-resolution and temporally coherent videos still remains a challenge due to their high-dimensionality complex temporal dynamics along with large spatial variations. Recent works on diffusion models have shown potential solve this challenge, yet they suffer from severe computation memory-inefficiency that limit scalability. To handle issue, we propose novel model for videos, coined projected latent video (PVDM),...
We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model. 3D tokenizer quantize into spatial-temporal visual tokens and propose an embedding method for masked token modeling facilitate multi-task learning. conduct extensive experiments demonstrate quality, efficiency, flexibility of MAGVIT. Our show that (i) MAGVIT performs favorably against state-of-the-art approaches establishes best-published FVD on three generation...
Conditional random fields (CRFs) provide powerful tools for building models to label image segments. They are particularly well-suited modeling local interactions among adjacent regions (e.g., super pixels). However, CRFs limited in dealing with complex, global (long-range) between regions. Complementary this, restricted Boltzmann machines (RBMs) can be used model shapes produced by segmentation models. In this work, we present a new that uses the combined power of these two network types...
Deep neural networks (DNNs) trained on large-scale datasets have recently achieved impressive improvements in face recognition. But a persistent challenge remains to develop methods capable of handling large pose variations that are relatively under-represented training data. This paper presents method for learning feature representation is invariant pose, without requiring extensive coverage We first propose generate non-frontal views from single frontal face, order increase the diversity...
Recognizing wild faces is extremely hard as they appear with all kinds of variations. Traditional methods either train specifically annotated variation data from target domains, or by introducing unlabeled to adapt the training data. Instead, we propose a universal representation learning framework that can deal larger unseen in given without leveraging domain knowledge. We firstly synthesize alongside some semantically meaningful variations, such low resolution, occlusion and head pose....
Informative image representations are important in achieving state-of-the-art performance object recognition tasks. Among feature learning algorithms that used to develop representations, restricted Boltzmann machines (RBMs) have good expressive power and build effective representations. However, the difficulty of training RBMs has been a barrier their wide use. To address this difficulty, we show connections between mixture models present an efficient method for utilize these connections....
Despite rapid advances in face recognition, there remains a clear gap between the performance of still image-based recognition and video-based due to vast difference visual quality domains difficulty curating diverse large-scale video datasets. This paper addresses both those challenges, through an image feature-level domain adaptation approach, learn discriminative frame representations. The framework utilizes unlabeled data reduce different while transferring knowledge from labeled images....