- Advanced Vision and Imaging
- Advanced Image Processing Techniques
- Human Pose and Action Recognition
- Image Processing Techniques and Applications
- Video Surveillance and Tracking Methods
- Advanced Image and Video Retrieval Techniques
- Image and Signal Denoising Methods
- 3D Shape Modeling and Analysis
- Image Enhancement Techniques
- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Computer Graphics and Visualization Techniques
- Image Retrieval and Classification Techniques
- Medical Image Segmentation Techniques
- Hand Gesture Recognition Systems
- Robotics and Sensor-Based Localization
- Multimodal Machine Learning Applications
- Digital Media Forensic Detection
- Visual Attention and Saliency Detection
- Optical measurement and interference techniques
- Anomaly Detection Techniques and Applications
- Face recognition and analysis
- Graph Theory and Algorithms
- Gait Recognition and Analysis
- Data Management and Algorithms
Seoul National University
2016-2025
ORCID
2022
Meta (Israel)
2020
Systems Research Institute
2017
AI Signal Research (United States)
2012
Hongik University
1997-2004
University of Southern California
1993-1997
Engineering Systems (United States)
1993-1997
Samsung (South Korea)
1994-1996
Southern California University for Professional Studies
1993
We present a highly accurate single-image superresolution (SR) method. Our method uses very deep convolutional network inspired by VGG-net used for ImageNet classification [19]. find increasing our depth shows significant improvement in accuracy. final model 20 weight layers. By cascading small filters many times structure, contextual information over large image regions is exploited an efficient way. With networks, however, convergence speed becomes critical issue during training. propose...
Recent research on super-resolution has progressed with the development of deep convolutional neural networks (DCNN). In particular, residual learning techniques exhibit improved performance. this paper, we develop an enhanced network (EDSR) performance exceeding those current state-of-the-art SR methods. The significant improvement our model is due to optimization by removing unnecessary modules in conventional networks. further expanding size while stabilize training procedure. We also...
We propose an image super-resolution method (SR) using a deeply-recursive convolutional network (DRCN). Our has very deep recursive layer (up to 16 recursions). Increasing recursion depth can improve performance without introducing new parameters for additional convolutions. Albeit advantages, learning DRCN is hard with standard gradient descent due exploding/ vanishing gradients. To ease the difficulty of training, we two extensions: recursive-supervision and skip-connection. outperforms...
Non-uniform blind deblurring for general dynamic scenes is a challenging computer vision problem as blurs arise not only from multiple object motions but also camera shake, scene depth variation. To remove these complicated motion blurs, conventional energy optimization based methods rely on simple assumptions such that blur kernel partially uniform or locally linear. Moreover, recent machine learning depend synthetic datasets generated under assumptions. This makes fail to where difficult...
This paper reviews the first challenge on single image super-resolution (restoration of rich details in an low resolution image) with focus proposed solutions and results. A new DIVerse 2K dataset (DIV2K) was employed. The had 6 competitions divided into 2 tracks 3 magnification factors each. Track 1 employed standard bicubic downscaling setup, while unknown operators (blur kernel decimation) but learnable through high res train images. Each competition ∽100 registered participants 20 teams...
We propose a novel tracking algorithm that can work robustly in challenging scenario such several kinds of appearance and motion changes an object occur at the same time. Our is based on visual decomposition scheme for efficient design observation models as well trackers. In our scheme, model decomposed into multiple basic are constructed by sparse principal component analysis (SPCA) set feature templates. Each covers specific object. The also represented combination models, each which...
This paper introduces a novel large dataset for video deblurring, super-resolution and studies the state-of-the-art as emerged from NTIRE 2019 restoration challenges. The deblurring challenges are each first challenge of its kind, with 4 competitions, hundreds participants tens proposed solutions. Our newly collected REalistic Diverse Scenes (REDS) was employed by In our study, we compare solutions to set representative methods literature evaluate them on REDS dataset. We find that push in...
We propose a novel tracking framework called visual tracker sampler that tracks target robustly by searching for the appropriate trackers in each frame. Since real-world environment varies severely over time, should be adapted or newly constructed depending on current situation. To do this, our method obtains several samples of not only states but also themselves during sampling process. The are efficiently sampled using Markov Chain Monte Carlo from predefined space proposing new appearance...
Most of the existing deep learning-based methods for 3D hand and human pose estimation from a single depth map are based on common framework that takes 2D directly regresses coordinates keypoints, such as or body joints, via convolutional neural networks (CNNs). The first weakness this approach is presence perspective distortion in map. While intrinsically data, many previous treat maps images can distort shape actual object through projection to space. This compels network perform...
Stereo vision is a well-known ranging method because it resembles the basic mechanism of human eye. However, computational complexity and large amount data access make real-time processing stereo challenging inherent instruction cycle delay within conventional computers. In order to solve this problem, past 20 years research have focused on use dedicated hardware architecture for vision. This paper proposes fully pipelined system providing dense disparity image with additional sub-pixel...
Although significant improvement has been achieved recently in 3D human pose estimation, most of the previous methods only treat a single-person case. In this work, we firstly propose fully learning-based, camera distance-aware top-down approach for multi-person estimation from single RGB image. The pipeline proposed system consists detection, absolute root localization, and root-relative modules. Our achieves comparable results with state-of-the-art models without any ground truth...
Prevailing video frame interpolation techniques rely heavily on optical flow estimation and require additional model complexity computational cost; it is also susceptible to error propagation in challenging scenarios with large motion heavy occlusion. To alleviate the limitation, we propose a simple but effective deep neural network for interpolation, which end-to-end trainable free from component. Our algorithm employs special feature reshaping operation, referred as PixelShuffle, channel...
In this paper, we strive to answer two questions: What is the current state of 3D hand pose estimation from depth images? And, what are next challenges that need be tackled? Following successful Hands Million Challenge (HIM2017), investigate top 10 state-of-the-art methods on three tasks: single frame estimation, tracking, and during object interaction. We analyze performance different CNN structures with regard shape, joint visibility, view point articulation distributions. Our findings...
Most state-of-the-art dynamic scene deblurring methods based on accurate motion segmentation assume that blur is small or the specific type of causing known. In this paper, we study a segmentation-free method, which unlike other conventional methods. When can be approximated to linear locally (pixel-wise) varying, handle various types caused by camera shake, including out-of-plane motion, depth variation, radial distortion, and so on. Thus, propose new energy model simultaneously estimating...
Despite the recent success of single image-based 3D human pose and shape estimation methods, recovering temporally consistent smooth motion from a video is still challenging. Several video-based methods have been proposed; however, they fail to resolve methods’ temporal inconsistency issue due strong dependency on static feature current frame. In this regard, we present mesh recovery system (TCMR). It effectively focuses past future frames’ information without being dominated by feature. Our...
Multi-person pose estimation from a 2D image is an essential technique for human behavior understanding. In this paper, we propose refinement network that estimates refined tuple of input and pose. The was performed mainly through end-to-end trainable multi-stage architecture in previous methods. However, they are highly dependent on models require careful model design. By contrast, model-agnostic method. According to recent study, state-of-the-art methods have similar error distributions....
Blind-spot network (BSN) and its variants have made significant advances in self-supervised denoising. Never-theless, they are still bound to synthetic noisy inputs due less practical assumptions like pixel-wise independent noise. Hence, it is challenging deal with spatially corre-lated real-world noise using BSN. Recently, pixel-shuffle downsampling (PD) has been proposed re-move the spatial correlation of However, not trivial integrate PD BSN directly, which prevents fully denoising model...
Hands are often severely occluded by objects, which makes 3D hand mesh estimation challenging. Previous works have disregarded information at regions. However, we argue that regions strong correlations with hands so they can provide highly beneficial for complete estimation. Thus, in this work, propose a novel network HandOccNet, fully exploits the as secondary means to enhance image features and make it much richer. To end, design two successive Transformer-based modules, called feature...