NFDI4DS | UHH-SEMS - Publication Details

Adaptation and Re-identification Network: An Unsupervised Deep Transfer Learning Approach to Person Re-identification

OPENALEX - Publications

Yu-Jhe Li Fu-En Yang Yen‐Cheng Liu Yu-Ying Yeh Xiaofei Du and 1 more

Person re-identification (Re-ID) aims at recognizing the same person from images taken across different cameras. To address this task, one typically requires a large amount labeled data for training an effective Re-ID model, which might not be practical real-world applications. alleviate limitation, we choose to exploit sufficient of pre-existing (auxiliary) dataset. By jointly considering such auxiliary dataset and interest (but without label information), our proposed adaptation network...

10.1109/cvprw.2018.00054 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2018-06-01

Learning Hierarchical Self-Attention for Video Summarization

OPENALEX - Publications

Yen‐Ting Liu Yu-Jhe Li Fu-En Yang Shang‐Fu Chen Yu-Chiang Frank Wang

Video summarization still remains a challenging task. Due to sufficient video data on the Internet, such task draws significant attention in vision community and benefits wide range of applications, e.g., retrieval, search, etc. To effectively perform by deriving keyframes which represent given input video, we propose novel framework named Hierarchical Multi-Attention Network (H-MAN) comprises shot-level reconstruction model multi-head model. While our designed is two-stage hierarchical...

10.1109/icip.2019.8803639 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2019-08-26

Semantics-Guided Intra-Category Knowledge Transfer for Generalized Zero-Shot Learning

OPENALEX - Publications

Fu-En Yang Yuan‐Hao Lee Chia-Ching Lin Yu-Chiang Frank Wang

10.1007/s11263-023-01767-0 article EN International Journal of Computer Vision 2023-02-15

MotionMatcher: Motion Customization of Text-to-Video Diffusion Models via Motion Feature Matching

OPENALEX - Publications

Yuli Wu Chi-Pin Huang Fu-En Yang Yu-Chiang Frank Wang

Text-to-video (T2V) diffusion models have shown promising capabilities in synthesizing realistic videos from input text prompts. However, the description alone provides limited control over precise objects movements and camera framing. In this work, we tackle motion customization problem, where a reference video is provided as guidance. While most existing methods choose to fine-tune pre-trained reconstruct frame differences of video, observe that such strategy suffer content leakage they...

10.48550/arxiv.2502.13234 preprint EN arXiv (Cornell University) 2025-02-18

LayoutTransformer: Scene Layout Generation with Conceptual and Spatial Diversity

OPENALEX - Publications

Cheng-Fu Yang Wan-Cyuan Fan Fu-En Yang Yu-Chiang Frank Wang

When translating text inputs into layouts or images, existing works typically require explicit descriptions of each object in a scene, including their spatial information the associated relationships. To better exploit input, so that implicit objects relationships can be properly inferred during layout generation, we propose LayoutTransformer Network (LT-Net) this paper. Given scene-graph our LT-Net uniquely encodes semantic features for exploiting co-occurrences and This allows one to...

10.1109/cvpr46437.2021.00373 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Efficient Model Personalization in Federated Learning via Client-Specific Prompt Generation

OPENALEX - Publications

Fu-En Yang Chien-Yi Wang Yu-Chiang Frank Wang

Federated learning (FL) emerges as a decentralized framework which trains models from multiple distributed clients without sharing their data to preserve privacy. Recently, large-scale pre-trained (e.g., Vision Transformer) have shown strong capability of deriving robust representations. However, the heterogeneity among clients, limited computation resources, and communication bandwidth restrict deployment in FL frameworks. To leverage representations while enabling efficient model...

10.1109/iccv51070.2023.01755 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Learning Identity-Invariant Motion Representations for Cross-ID Face Reenactment

OPENALEX - Publications

Po-Hsiang Huang Fu-En Yang Yu-Chiang Frank Wang

Human face reenactment aims at transferring motion patterns from one (from a source-domain video) to an-other (in the target domain with identity of interest).While recent works report impressive results, they are notable handle multiple identities in unified model. In this paper, we propose unique network CrossID-GAN perform multi-ID reenactment. Given video extracted facial landmarks and target-domain image, our learns identity-invariant via such information produce videos whose ID matches...

10.1109/cvpr42600.2020.00711 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

A Pixel-Level Meta-Learner for Weakly Supervised Few-Shot Semantic Segmentation

OPENALEX - Publications

Yuan‐Hao Lee Fu-En Yang Yu-Chiang Frank Wang

Few-shot semantic segmentation addresses the learning task in which only few images with ground truth pixel-level labels are available for novel classes of interest. One is typically required to collect a large mount data (i.e., base classes) such information, followed by meta-learning strategies address above task. When image-level can be observed during both training and testing, it considered as an even more challenging weakly supervised few-shot segmentation. To this problem, we propose...

10.1109/wacv51458.2022.00167 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022-01-01

A Multi-Domain and Multi-Modal Representation Disentangler for Cross-Domain Image Manipulation and Classification

OPENALEX - Publications

Fu-En Yang Jing-Cheng Chang Chung-Chi Tsai Yu-Chiang Frank Wang

Learning interpretable data representation has been an active research topic in deep learning and computer vision. While disentanglement is effective technique for addressing this task, existing works cannot easily handle the problems which manipulating recognizing across multiple domains are desirable. In paper, we present a unified network architecture of Multi-domain Multi-modal Representation Disentangler (M2RD), with goal domain-invariant content associated domain-specific observed. By...

10.1109/tip.2019.2952707 article EN IEEE Transactions on Image Processing 2019-11-15

Language-Guided Transformer for Federated Multi-Label Classification

OPENALEX - Publications

I-Jieh Liu Ci-Siang Lin Fu-En Yang Yu-Chiang Frank Wang

Federated Learning (FL) is an emerging paradigm that enables multiple users to collaboratively train a robust model in privacy-preserving manner without sharing their private data. Most existing approaches of FL only consider traditional single-label image classification, ignoring the impact when transferring task multi-label classification. Nevertheless, it still challenging for deal with user heterogeneity local data distribution real-world scenario, and this issue becomes even more severe...

10.1609/aaai.v38i12.29295 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Adaptation and Re-Identification Network: An Unsupervised Deep Transfer Learning Approach to Person Re-Identification

OPENALEX - Publications

Yu-Jhe Li Fu-En Yang Yen‐Cheng Liu Yu-Ying Yeh Xiaofei Du and 1 more

Person re-identification (Re-ID) aims at recognizing the same person from images taken across different cameras. To address this task, one typically requires a large amount labeled data for training an effective Re-ID model, which might not be practical real-world applications. alleviate limitation, we choose to exploit sufficient of pre-existing (auxiliary) dataset. By jointly considering such auxiliary dataset and interest (but without label information), our proposed adaptation network...

10.48550/arxiv.1804.09347 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models

OPENALEX - Publications

Yu-Chu Yu Chi-Pin Huang J.C. Chen Kai-Po Chang Yung-Hsuan Lai and 2 more

Large-scale vision-language models (VLMs) have shown a strong zero-shot generalization capability on unseen-domain data. However, when adapting pre-trained VLMs to sequence of downstream tasks, they are prone forgetting previously learned knowledge and degrade their classification capability. To tackle this problem, we propose unique Selective Dual-Teacher Knowledge Transfer framework that leverages the most recent fine-tuned original as dual teachers preserve capabilities, respectively....

10.48550/arxiv.2403.09296 preprint EN arXiv (Cornell University) 2024-03-14

PromptHSI: Universal Hyperspectral Image Restoration Framework for Composite Degradation

OPENALEX - Publications

Chia-Ming Lee Ching-Heng Cheng Yu-Fan Lin Yi‐Ching Cheng Wun-Rong Liao and 3 more

Recent developments in All-in-One (AiO) RGB image restoration and prompt learning have enabled the representation of distinct degradations through prompts, allowing degraded images to be effectively addressed by a single model. However, this paradigm faces significant challenges when transferring hyperspectral (HSI) tasks due to: 1) domain gap between HSI features difference on their structures, 2) information loss visual prompts under severe composite degradations, 3) difficulties capturing...

10.48550/arxiv.2411.15922 preprint EN arXiv (Cornell University) 2024-11-24

Self-Supervised Pyramid Representation Learning for Multi-Label Visual Analysis and Beyond

OPENALEX - Publications

Cheng-Yen Hsieh Chih‐Jung Chang Fu-En Yang Yu-Chiang Frank Wang

While self-supervised learning has been shown to benefit a number of vision tasks, existing techniques mainly focus on image-level manipulation, which may not generalize well downstream tasks at patch or pixel levels. Moreover, SSL methods might sufficiently describe and associate the above representations within across image scales. In this paper, we propose Self-Supervised Pyramid Representation Learning (SS-PRL) framework. The proposed SS-PRL is designed derive pyramid levels via proper...

10.1109/wacv56688.2023.00272 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023-01-01

TAX: Tendency-and-Assignment Explainer for Semantic Segmentation with Multi-Annotators

OPENALEX - Publications

Yuan-Chia Cheng Zu-Yun Shiau Fu-En Yang Yu-Chiang Frank Wang

To understand how deep neural networks perform classification predictions, recent research attention has been focusing on developing techniques to offer desirable explanations. However, most existing methods cannot be easily applied for semantic segmentation; moreover, they are not designed interpretability under the multi-annotator setting. Instead of viewing ground-truth pixel-level labels annotated by a single annotator with consistent labeling tendency, we aim at providing interpretable...

10.48550/arxiv.2302.09561 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Dual-MTGAN: Stochastic and Deterministic Motion Transfer for Image-to-Video Synthesis

OPENALEX - Publications

Fu-En Yang Jing-Cheng Chang Yuan‐Hao Lee Yu-Chiang Frank Wang

Generating videos with content and motion variations is a challenging task in computer vision. While the recent development of GAN allows video generation from latent representations, it not easy to produce particular patterns interest. In this paper, we propose Dual Motion Transfer (Dual-MTGAN), which takes image data as inputs while learning disentangled representations. Our Dual-MTGAN able perform deterministic transfer stochastic generation. Based on given image, former preserves input...

10.1109/icpr48806.2021.9412781 article EN 2022 26th International Conference on Pattern Recognition (ICPR) 2021-01-10

Few-Shot Classification in Unseen Domains by Episodic Meta-Learning Across Visual Domains

OPENALEX - Publications

Yuan-Chia Cheng Ci-Siang Lin Fu-En Yang Yu-Chiang Frank Wang

Few-shot classification aims to carry out given only few labeled examples for the categories of interest. Though several approaches have been proposed, most existing few-shot learning (FSL) models assume that base and novel classes are drawn from same data domain. When it comes recognizing novel-class in an unseen domain, this becomes even more challenging task domain generalized classification. In paper, we present a unique framework domain-generalized classification, where homogeneous...

10.1109/icip42928.2021.9506141 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2021-08-23

Efficient Model Personalization in Federated Learning via Client-Specific Prompt Generation

OPENALEX - Publications

Fu-En Yang Chien-Yi Wang Yu-Chiang Frank Wang

Federated learning (FL) emerges as a decentralized framework which trains models from multiple distributed clients without sharing their data to preserve privacy. Recently, large-scale pre-trained (e.g., Vision Transformer) have shown strong capability of deriving robust representations. However, the heterogeneity among clients, limited computation resources, and communication bandwidth restrict deployment in FL frameworks. To leverage representations while enabling efficient model...

10.48550/arxiv.2308.15367 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Language-Guided Transformer for Federated Multi-Label Classification

OPENALEX - Publications

I-Jieh Liu Ci-Siang Lin Fu-En Yang Yu-Chiang Frank Wang

Federated Learning (FL) is an emerging paradigm that enables multiple users to collaboratively train a robust model in privacy-preserving manner without sharing their private data. Most existing approaches of FL only consider traditional single-label image classification, ignoring the impact when transferring task multi-label classification. Nevertheless, it still challenging for deal with user heterogeneity local data distribution real-world scenario, and this issue becomes even more severe...

10.48550/arxiv.2312.07165 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Semantics-Guided Representation Learning with Applications to Visual Synthesis

OPENALEX - Publications

Jiawei Yan Ci-Siang Lin Fu-En Yang Yu-Jhe Li Yu-Chiang Frank Wang

Learning interpretable and interpolatable latent representations has been an emerging research direction, allowing researchers to understand utilize the derived space for further applications such as visual synthesis or recognition. While most existing approaches derive induces smooth transition in image appearance, it is still not clear how observe desirable which would contain semantic information of interest. In this paper, we aim learn meaningful simultaneously perform semantic-oriented...

10.48550/arxiv.2010.10772 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Self-Supervised Pyramid Representation Learning for Multi-Label Visual Analysis and Beyond

OPENALEX - Publications

Cheng-Yen Hsieh Chih‐Jung Chang Fu-En Yang Yu-Chiang Frank Wang

While self-supervised learning has been shown to benefit a number of vision tasks, existing techniques mainly focus on image-level manipulation, which may not generalize well downstream tasks at patch or pixel levels. Moreover, SSL methods might sufficiently describe and associate the above representations within across image scales. In this paper, we propose Self-Supervised Pyramid Representation Learning (SS-PRL) framework. The proposed SS-PRL is designed derive pyramid levels via proper...

10.48550/arxiv.2208.14439 preprint EN other-oa arXiv (Cornell University) 2022-01-01