- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Advanced Image and Video Retrieval Techniques
- Multimodal Machine Learning Applications
- Machine Learning and Data Classification
- Adversarial Robustness in Machine Learning
- Handwritten Text Recognition Techniques
- Anomaly Detection Techniques and Applications
- Face recognition and analysis
- Generative Adversarial Networks and Image Synthesis
- Advanced Vision and Imaging
- Text and Document Classification Technologies
- Neural Networks and Applications
- Visual Attention and Saliency Detection
- Image Enhancement Techniques
- Optical measurement and interference techniques
- Natural Language Processing Techniques
- Human Pose and Action Recognition
- Image Retrieval and Classification Techniques
- Image Processing and 3D Reconstruction
- COVID-19 diagnosis using AI
- Robotics and Sensor-Based Localization
- CCD and CMOS Imaging Sensors
- Advancements in Photolithography Techniques
- Educational Technology and Assessment
Sogang University
2021-2024
California State University, Fresno
2024
University of California, San Francisco
2024
Istituto Tecnico Industriale Alessandro Volta
2021
Weatherford College
2021
Naver (South Korea)
2019-2021
Yonsei University
2004-2020
Pohang University of Science and Technology
1998
Regional dropout strategies have been proposed to enhance performance of convolutional neural network classifiers. They proved be effective for guiding the model attend on less discriminative parts objects (e.g. leg as opposed head a person), thereby letting generalize better and object localization capabilities. On other hand, current methods regional removes informative pixels training images by overlaying patch either black or random noise. Such removal is not desirable because it suffers...
Vision Transformer (ViT) extends the application range of transformers from language processing to computer vision tasks as being an alternative architecture against existing convolutional neural networks (CNN). Since transformer-based has been innovative for modeling, design convention towards effective less studied yet. From successful principles CNN, we investigate role spatial dimension conversion and its effectiveness on architecture. We particularly attend reduction principle CNNs;...
Weakly Supervised Object Localization (WSOL) techniques learn the object location only using image-level labels, without annotations. A common limitation for these is that they cover most discriminative part of object, not entire object. To address this problem, we propose an Attention-based Dropout Layer (ADL), which utilizes self-attention mechanism to process feature maps model. The proposed method composed two key components: 1) hiding from model capturing integral extent and 2)...
Regional dropout strategies have been proposed to enhance the performance of convolutional neural network classifiers. They proved be effective for guiding model attend on less discriminative parts objects (e.g. leg as opposed head a person), thereby letting generalize better and object localization capabilities. On other hand, current methods regional remove informative pixels training images by overlaying patch either black or random noise. Such removal is not desirable because it leads...
Weakly-supervised object localization (WSOL) has gained popularity over the last years for its promise to train models with only image-level labels. Since seminal WSOL work of class activation mapping (CAM), field focused on how expand attention regions cover objects more broadly and localize them better. However, these strategies rely full supervision validate hyperparameters model selection, which is in principle prohibited under setup. In this paper, we argue that task ill-posed labels,...
Weakly supervised semantic segmentation (WSSS) methods are often built on pixel-level localization maps obtained from a classifier. However, training class labels only, classifiers suffer the spurious correlation between fore-ground and background cues (e.g. train rail), fundamentally bounding performance of WSSS. There have been previous endeavors to address this issue with additional supervision. We propose novel source information distinguish foreground background: Out-of-Distribution...
Both weakly supervised single object localization and semantic segmentation techniques learn an object's location using only image-level labels. However, these are limited to cover the most discriminative part of not entire object. To address this problem, we propose attention-based dropout layer, which utilizes attention mechanism locate efficiently. achieve this, devise two key components, 1) hiding from model capture object, 2) highlighting informative region improve classification power...
ImageNet has been the most popular image classification benchmark, but it is also one with a significant level of label noise. Recent studies have shown that many samples contain multiple classes, despite being assumed to be single-label benchmark. They thus proposed turn evaluation into multi-label task, exhaustive annotations per image. However, they not fixed training set, presumably because formidable annotation cost. We argue mismatch between and effectively images equally, if more,...
Recently, low-shot learning has been proposed for handling the lack of training data in machine learning. Despite importance this issue, relatively less efforts have made to study problem. In paper, we aim increase size dataset various ways improve accuracy and robustness face recognition. detail, adapt a generator from Generative Adversarial Network (GAN) dataset, which includes base set, widely available novel given limited while adopting transfer as backend. Based on extensive...
Vision Transformer (ViT) extends the application range of transformers from language processing to computer vision tasks as being an alternative architecture against existing convolutional neural networks (CNN). Since transformer-based has been innovative for modeling, design convention towards effective less studied yet. From successful principles CNN, we investigate role spatial dimension conversion and its effectiveness on architecture. We particularly attend reduction principle CNNs;...
Despite apparent human-level performances of deep neural networks (DNN), they behave fundamentally differently from humans. They easily change predictions when small corruptions such as blur and noise are applied on the input (lack robustness), often produce confident out-of-distribution samples (improper uncertainty measure). While a number researches have aimed to address those issues, proposed solutions typically expensive complicated (e.g. Bayesian inference adversarial training)....
Weakly-supervised object localization (WSOL) enables finding an using a dataset without any information. By simply training classification model only image-level annotations, the feature map of can be utilized as score for localization. In spite many WSOL methods proposing novel strategies, there has not been de facto standard about how to normalize class activation (CAM). Consequently, have failed fully exploit their own capacity because misuse normalization method. this paper, we review...
State-of-the-art techniques in weakly-supervised semantic segmentation (WSSS) using image-level labels exhibit severe performance degradation on driving scene datasets such as Cityscapes. To address this challenge, we develop a new WSSS framework tailored to datasets. Based extensive analysis of dataset characteristics, employ Contrastive Language-Image Pre-training (CLIP) our baseline obtain pseudo-masks. However, CLIP introduces two key challenges: (1) pseudo-masks from lack representing...
Weakly-supervised object localization (WSOL) has gained popularity over the last years for its promise to train models with only image-level labels. Since seminal WSOL work of class activation mapping (CAM), field focused on how expand attention regions cover objects more broadly and localize them better. However, these strategies rely full supervision validating hyperparameters model selection, which is in principle prohibited under setup. In this paper, we argue that task ill-posed labels,...
Weakly Supervised Object Localization (WSOL) techniques learn the object location only using image-level labels, without annotations. A common limitation for these is that they cover most discriminative part of object, not entire object. To address this problem, we propose an Attention-based Dropout Layer (ADL), which utilizes self-attention mechanism to process feature maps model. The proposed method composed two key components: 1) hiding from model capturing integral extent and 2)...
The class activation mapping, or CAM, has been the cornerstone of feature attribution methods for multiple vision tasks. Its simplicity and effectiveness have led to wide applications in explanation visual predictions weakly-supervised localization However, CAM its own shortcomings. computation maps relies on ad-hoc calibration steps that are not part training computational graph, making it difficult us understand real meaning values. In this paper, we improve by explicitly incorporating a...
The goal of unsupervised co-localization is to locate the object in a scene under assumptions that 1) dataset consists only one superclass, e.g., birds, and 2) there are no human-annotated labels dataset. most recent method achieves impressive performance by employing self-supervised representation learning approaches such as predicting rotation. In this paper, we introduce new contrastive objective directly on attention maps enhance performance. Our loss function exploits rich information...