Kihyuk Sohn

ORCID: 0000-0003-4303-8319
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Domain Adaptation and Few-Shot Learning
  • Multimodal Machine Learning Applications
  • Generative Adversarial Networks and Image Synthesis
  • Face recognition and analysis
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Face and Expression Recognition
  • Machine Learning and Data Classification
  • Biometric Identification and Security
  • Adversarial Robustness in Machine Learning
  • COVID-19 diagnosis using AI
  • Anomaly Detection Techniques and Applications
  • Advanced Vision and Imaging
  • Image Retrieval and Classification Techniques
  • Computer Graphics and Visualization Techniques
  • Digital Media Forensic Detection
  • Video Surveillance and Tracking Methods
  • Topic Modeling
  • Speech Recognition and Synthesis
  • Data-Driven Disease Surveillance
  • Human Pose and Action Recognition
  • Imbalanced Data Classification Techniques
  • Privacy-Preserving Technologies in Data
  • Human Motion and Animation
  • Video Analysis and Summarization

Google (United States)
2019-2024

Korea Advanced Institute of Science and Technology
2021

NEC (United States)
2017-2020

NEC (Japan)
2015-2020

University of Michigan
2011-2015

Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance. In this paper, we demonstrate the power simple combination two common SSL methods: consistency regularization and pseudo-labeling. Our algorithm, FixMatch, first generates pseudo-labels using predictions on weakly-augmented images. For given image, pseudo-label is only retained if model produces high-confidence prediction. The then trained predict when fed...

10.48550/arxiv.2001.07685 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Convolutional neural network-based approaches for semantic segmentation rely on supervision with pixel-level ground truth, but may not generalize well to unseen image domains. As the labeling process is tedious and labor intensive, developing algorithms that can adapt source truth labels target domain of great interest. In this paper, we propose an adversarial learning method adaptation in context segmentation. Considering segmentations as structured outputs contain spatial similarities...

10.1109/cvpr.2018.00780 article EN 2018-06-01

We aim at constructing a high performance model for defect detection that detects unknown anomalous patterns of an image without data. To this end, we propose two-stage framework building anomaly detectors using normal training data only. first learn self-supervised deep representations and then build generative one-class classifier on learned representations. by classifying from the CutPaste, simple augmentation strategy cuts patch pastes random location large image. Our empirical study...

10.1109/cvpr46437.2021.00954 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Despite recent advances in face recognition using deep learning, severe accuracy drops are observed for large pose variations unconstrained environments. Learning pose-invariant features is one solution, but needs expensively labeled large-scale data and carefully designed feature learning algorithms. In this work, we focus on frontalizing faces the wild under various head poses, including extreme profile view's. We propose a novel 3D Morphable Model (3DMM) conditioned Face Frontalization...

10.1109/iccv.2017.430 article EN 2017-10-01

Predicting structured outputs such as semantic segmentation relies on expensive per-pixel annotations to learn supervised models like convolutional neural networks. However, trained one data domain may not generalize well other domains without for model finetuning. To avoid the labor-intensive process of annotation, we develop a adaptation method adapt source unlabeled target domain. We propose discriminative feature representations patches in by discovering multiple modes patch-wise output...

10.1109/iccv.2019.00154 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Despite the large volume of face recognition datasets, there is a significant portion subjects, which samples are insufficient and thus under-represented. Ignoring such results in training data. Training with under-represented data leads to biased classifiers conventionally-trained deep networks. In this paper, we propose center-based feature transfer framework augment space subjects from regular that have sufficiently diverse samples. A Gaussian prior variance assumed across all ones...

10.1109/cvpr.2019.00585 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

We improve the recently-proposed "MixMatch" semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring. Distribution encourages marginal of predictions on unlabeled data to be close ground-truth labels. Augmentation anchoring feeds multiple strongly augmented versions an input into model each output prediction for a weakly-augmented version same input. To produce strong augmentations, we propose variant AutoAugment which learns...

10.48550/arxiv.1911.09785 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Semi-supervised learning (SSL) has a potential to improve the predictive performance of machine models using unlabeled data. Although there been remarkable recent progress, scope demonstration in SSL mainly on image classification tasks. In this paper, we propose STAC, simple yet effective framework for visual object detection along with data augmentation strategy. STAC deploys highly confident pseudo labels localized objects from an and updates model by enforcing consistency via strong...

10.48550/arxiv.2005.04757 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Recently, convolutional neural networks (CNNs) have been used as a powerful tool to solve many problems of machine learning and computer vision. In this paper, we aim provide insight on the property networks, well generic method improve performance CNN architectures. Specifically, first examine existing models observe an intriguing that filters in lower layers form pairs (i.e., with opposite phase). Inspired by our observation, propose novel, simple yet effective activation scheme called...

10.48550/arxiv.1603.05201 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Object detection systems based on the deep convolutional neural network (CNN) have recently made ground- breaking advances several object benchmarks. While features learned by these high-capacity networks are discriminative for categorization, inaccurate localization is still a major source of error detection. Building upon CNN architectures, we address problem 1) using search algorithm Bayesian optimization that sequentially proposes candidate regions an bounding box, and 2) training with...

10.1109/cvpr.2015.7298621 preprint EN 2015-06-01

Semi-supervised learning on class-imbalanced data, although a realistic problem, has been under studied. While existing semi-supervised (SSL) methods are known to perform poorly minority classes, we find that they still generate high precision pseudo-labels classes. By exploiting this property, in work, propose Class-Rebalancing Self-Training (CReST), simple yet effective framework improve SSL data. CReST iteratively retrains baseline model with labeled set expanded by adding pseudolabeled...

10.1109/cvpr46437.2021.01071 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Despite the remarkable progress in deep generative models, synthesizing high-resolution and temporally coherent videos still remains a challenge due to their high-dimensionality complex temporal dynamics along with large spatial variations. Recent works on diffusion models have shown potential solve this challenge, yet they suffer from severe computation memory-inefficiency that limit scalability. To handle issue, we propose novel model for videos, coined projected latent video (PVDM),...

10.1109/cvpr52729.2023.01770 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model. 3D tokenizer quantize into spatial-temporal visual tokens and propose an embedding method for masked token modeling facilitate multi-task learning. conduct extensive experiments demonstrate quality, efficiency, flexibility of MAGVIT. Our show that (i) MAGVIT performs favorably against state-of-the-art approaches establishes best-published FVD on three generation...

10.1109/cvpr52729.2023.01008 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Conditional random fields (CRFs) provide powerful tools for building models to label image segments. They are particularly well-suited modeling local interactions among adjacent regions (e.g., super pixels). However, CRFs limited in dealing with complex, global (long-range) between regions. Complementary this, restricted Boltzmann machines (RBMs) can be used model shapes produced by segmentation models. In this work, we present a new that uses the combined power of these two network types...

10.1109/cvpr.2013.263 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2013-06-01

Deep neural networks (DNNs) trained on large-scale datasets have recently achieved impressive improvements in face recognition. But a persistent challenge remains to develop methods capable of handling large pose variations that are relatively under-represented training data. This paper presents method for learning feature representation is invariant pose, without requiring extensive coverage We first propose generate non-frontal views from single frontal face, order increase the diversity...

10.1109/iccv.2017.180 article EN 2017-10-01

Recognizing wild faces is extremely hard as they appear with all kinds of variations. Traditional methods either train specifically annotated variation data from target domains, or by introducing unlabeled to adapt the training data. Instead, we propose a universal representation learning framework that can deal larger unseen in given without leveraging domain knowledge. We firstly synthesize alongside some semantically meaningful variations, such low resolution, occlusion and head pose....

10.1109/cvpr42600.2020.00685 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Informative image representations are important in achieving state-of-the-art performance object recognition tasks. Among feature learning algorithms that used to develop representations, restricted Boltzmann machines (RBMs) have good expressive power and build effective representations. However, the difficulty of training RBMs has been a barrier their wide use. To address this difficulty, we show connections between mixture models present an efficient method for utilize these connections....

10.1109/iccv.2011.6126554 article EN International Conference on Computer Vision 2011-11-01

Despite rapid advances in face recognition, there remains a clear gap between the performance of still image-based recognition and video-based due to vast difference visual quality domains difficulty curating diverse large-scale video datasets. This paper addresses both those challenges, through an image feature-level domain adaptation approach, learn discriminative frame representations. The framework utilizes unlabeled data reduce different while transferring knowledge from labeled images....

10.1109/iccv.2017.630 article EN 2017-10-01
Coming Soon ...