Sangryul Jeon

ORCID: 0000-0003-0991-6165
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image and Video Retrieval Techniques
  • Advanced Neural Network Applications
  • Domain Adaptation and Few-Shot Learning
  • Multimodal Machine Learning Applications
  • Advanced Vision and Imaging
  • Image Retrieval and Classification Techniques
  • Human Pose and Action Recognition
  • Generative Adversarial Networks and Image Synthesis
  • Image Enhancement Techniques
  • Reinforcement Learning in Robotics
  • Video Surveillance and Tracking Methods
  • Medical Image Segmentation Techniques
  • Anomaly Detection Techniques and Applications
  • Remote Sensing and Land Use
  • Face recognition and analysis
  • Video Analysis and Summarization
  • Machine Learning and Data Classification
  • Image Processing Techniques and Applications
  • Robot Manipulation and Learning
  • Remote-Sensing Image Classification
  • Geographic Information Systems Studies

Pusan National University
2024

University of Michigan
2023

International Computer Science Institute
2023

University of California, Berkeley
2023

Yonsei University
2017-2021

Korea University
2021

Ewha Womans University
2021

We present a descriptor, called fully convolutional self-similarity (FCSS), for dense semantic correspondence. To robustly match points among different instances within the same object class, we formulate FCSS using local (LSS) network. In contrast to existing CNN-based descriptors, is inherently insensitive intra-class appearance variations because of its LSS-based structure, while maintaining precise localization ability deep neural networks. The sampling patterns structure and measure are...

10.1109/cvpr.2017.73 article EN 2017-07-01

We propose a novel cost aggregation network, called Cost Aggregation Transformers (CATs), to find dense correspondences between semantically similar images with additional challenges posed by large intra-class appearance and geometric variations. is highly important process in matching tasks, which the accuracy depends on quality of its output. Compared hand-crafted or CNN-based methods addressing aggregation, that either lacks robustness severe deformations inherit limitation CNNs fail...

10.48550/arxiv.2106.02520 preprint EN cc-by arXiv (Cornell University) 2021-01-01

We present semantic attribute matching networks (SAM-Net) for jointly establishing correspondences and transferring attributes across semantically similar images, which intelligently weaves the advantages of two tasks while overcoming their limitations. SAM-Net accomplishes this through an iterative process reliable by reducing discrepancy between images synthesizing transferred using learned correspondences. To learn weak supervisions in form image pairs, we a loss based on similarity...

10.1109/cvpr.2019.01262 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

We present a novel framework for contrastive learning of pixel-level representation using only unlabeled video. Without the need ground-truth annotation, our method is capable collecting well-defined positive correspondences by measuring their confidences and negative ones appropriately adjusting hardness during training. This allows us to suppress adverse impact ambiguous matches prevent trivial solution from being yielded too hard or easy samples. To accomplish this, we incorporate three...

10.1109/cvpr46437.2021.00109 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Convolutional neural networks (CNNs) based approaches for semantic alignment and object landmark detection have improved their performance significantly. Current efforts the two tasks focus on addressing lack of massive training data through weakly- or unsupervised learning frameworks. In this paper, we present a joint approach obtaining dense correspondences discovering landmarks from semantically similar images. Based key insight that can mutually provide supervisions to each other, our...

10.1109/iccv.2019.00739 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

We present a descriptor, called fully convolutional self-similarity (FCSS), for dense semantic correspondence. To robustly match points among different instances within the same object class, we formulate FCSS using local (LSS) network. In contrast to existing CNN-based descriptors, is inherently insensitive intra-class appearance variations because of its LSS-based structure, while maintaining precise localization ability deep neural networks. The sampling patterns structure and measure are...

10.48550/arxiv.1702.00926 preprint EN other-oa arXiv (Cornell University) 2017-01-01

We propose a novel deep architecture for video summarization in untrimmed videos that simultaneously recognizes action and scene classes every segments. Our networks accomplish this through multi-task fusion approach based on two types of attention modules to explore semantic correlations between the videos. The proposed consist feature embedding inference stochastically leverage inferred representations. Additionally, we design new center loss function learns representations by enforcing...

10.1109/iccvw.2019.00193 article EN 2019-10-01

This paper presents a novel deep architecture for weakly-supervised temporal action localization that not only generates segment-level responses but also propagates to the neighborhood in form of graph Laplacian regularization. Specifically, our approach consists two sub-modules; class activation module estimate score map over time through classifiers, and regularization refine estimated by solving quadratic programming problem with predicted semantic affinities. Since these modules are...

10.1109/icip.2019.8803589 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2019-08-26

Recent vision-based reinforcement learning (RL) methods have found extracting high-level features from raw pixels with self-supervised to be effective in policies. However, these focus on global representations of images, and disregard local spatial structures present the consecutively stacked frames. In this paper, we propose a novel approach, termed Paired Similarity Representation Learning (PSRL) for effectively encoding an unsupervised manner. Given input frames, latent volumes are first...

10.1109/cvpr52729.2023.01447 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Existing building recognition methods, exemplified by BRAILS, utilize supervised learning to extract information from satellite and street-view images for classification segmentation. However, each task module requires human-annotated data, hindering the scalability robustness regional variations annotation imbalances. In response, we propose a new zero-shot workflow attribute extraction that utilizes large-scale vision language models mitigate reliance on external annotations. The proposed...

10.1109/wacv57701.2024.00845 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024-01-03

Existing pipelines of semantic correspondence commonly include extracting high-level features for the invariance against intra-class variations and background clutters. This architecture, however, inevitably results in a low-resolution matching field that additionally requires an ad-hoc interpolation process as post-processing converting it into high-resolution one, certainly limiting overall performance results. To overcome this, inspired by recent success implicit neural representation, we...

10.48550/arxiv.2210.02689 preprint EN cc-by arXiv (Cornell University) 2022-01-01

While recognizing human actions and surrounding scenes addresses different aspects of video understanding, they have strong correlations that can be used to complement the singular information each other. In this paper, we propose an approach for joint action scene recognition is formulated in end-to-end learning framework based on temporal attention techniques fusion them. By applying modules generic feature network, features are extracted efficiently, then composed a single vector through...

10.1145/3265987.3265989 article EN 2018-10-15

We present semantic attribute matching networks (SAM-Net) for jointly establishing correspondences and transferring attributes across semantically similar images, which intelligently weaves the advantages of two tasks while overcoming their limitations. SAM-Net accomplishes this through an iterative process reliable by reducing discrepancy between images synthesizing transferred using learned correspondences. To learn weak supervisions in form image pairs, we a loss based on similarity...

10.48550/arxiv.1904.02969 preprint EN other-oa arXiv (Cornell University) 2019-01-01

This paper presents a deep architecture, called pyramidal semantic correspondence networks (PSCNet), that estimates locally-varying affine transformation fields across semantically similar images. To deal with large appearance and shape variations commonly exist among different instances within the same object category, we leverage model where are progressively estimated in coarse-to-fine manner so smoothness constraint is naturally imposed. Different from previous methods which directly...

10.1109/tpami.2021.3123679 article EN cc-by-nc-nd IEEE Transactions on Pattern Analysis and Machine Intelligence 2021-10-29

This paper presents a deep architecture for dense semantic correspondence, called pyramidal affine regression networks (PARN), that estimates locally-varying transformation fields across images. To deal with intra-class appearance and shape variations commonly exist among different instances within the same object category, we leverage model where are progressively estimated in coarse-to-fine manner so smoothness constraint is naturally imposed networks. PARN residual transformations at each...

10.48550/arxiv.1807.02939 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Existing building recognition methods, exemplified by BRAILS, utilize supervised learning to extract information from satellite and street-view images for classification segmentation. However, each task module requires human-annotated data, hindering the scalability robustness regional variations annotation imbalances. In response, we propose a new zero-shot workflow attribute extraction that utilizes large-scale vision language models mitigate reliance on external annotations. The proposed...

10.48550/arxiv.2312.12479 preprint EN cc-by arXiv (Cornell University) 2023-01-01

We present a novel fusion scheme between multiple intermediate convolutional features within neurual network (CNN) for dense correspondence estimation. In contrast to existing CNN-based descriptors that utilize single activation, our approach jointly uses of CNN through the attention weight balances contribution each features. formulate overall as two sub-networks, and network. The is designed provide matching costs while learn optimal them. These networks are learned in joint manner boost...

10.1109/icip.2017.8296433 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2017-09-01

Convolutional neural networks (CNNs) based approaches for semantic alignment and object landmark detection have improved their performance significantly. Current efforts the two tasks focus on addressing lack of massive training data through weakly- or unsupervised learning frameworks. In this paper, we present a joint approach obtaining dense correspondences discovering landmarks from semantically similar images. Based key insight that can mutually provide supervisions to each other, our...

10.48550/arxiv.1910.00754 preprint EN other-oa arXiv (Cornell University) 2019-01-01
Coming Soon ...