- Domain Adaptation and Few-Shot Learning
- Multimodal Machine Learning Applications
- Advanced Image and Video Retrieval Techniques
- Advanced Vision and Imaging
- Video Surveillance and Tracking Methods
- Generative Adversarial Networks and Image Synthesis
- Advanced Image Processing Techniques
- Face recognition and analysis
- Human Pose and Action Recognition
- Face and Expression Recognition
- 3D Shape Modeling and Analysis
- Image Enhancement Techniques
- Video Analysis and Summarization
- Advanced Neural Network Applications
- Music and Audio Processing
- Image Retrieval and Classification Techniques
- Image Processing Techniques and Applications
- Image and Signal Denoising Methods
- Digital Media Forensic Detection
- Text and Document Classification Technologies
- Adversarial Robustness in Machine Learning
- Sparse and Compressive Sensing Techniques
- Computer Graphics and Visualization Techniques
- Anomaly Detection Techniques and Applications
- Advanced Image Fusion Techniques
National Taiwan University
2016-2025
Nvidia (United States)
2024
Asus (Taiwan)
2019-2021
Center for Information Technology
2010-2018
Research Center for Information Technology Innovation, Academia Sinica
2010-2017
Academia Sinica
2011-2016
Institute of Information Science, Academia Sinica
2010-2012
Carnegie Mellon University
2007-2009
Few-shot classification aims to learn a classifier recognize unseen classes during training with limited labeled examples. While significant progress has been made, the growing complexity of network designs, meta-learning algorithms, and differences in implementation details make fair comparison difficult. In this paper, we present 1) consistent comparative analysis several representative few-shot results showing that deeper backbones significantly reduce performance among methods on...
Despite the recent success of deep-learning based semantic segmentation, deploying a pre-trained road scene segmenter to city whose images are not presented in training set would achieve satisfactory performance due dataset biases. Instead collecting large number annotated each interest train or refine segmenter, we propose an unsupervised learning approach adapt segmenters across different cities. By utilizing Google Street View and its time-machine feature, can collect unannotated for at...
In this paper, we propose a novel deep learning architecture for multi-label zero-shot (ML-ZSL), which is able to predict multiple unseen class labels each input instance. Inspired by the way humans utilize semantic knowledge between objects of interests, framework that incorporates graphs describing relationships labels. Our model learns an information propagation mechanism from label space, can be applied interdependencies seen and With such investigation structured visual reasoning, show...
Monocular depth estimation is a challenging task in scene understanding, with the goal to acquire geometric properties of 3D space from 2D images. Due lack RGB-depth image pairs, unsupervised learning methods aim at deriving information alternative supervision such as stereo pairs. However, most existing works fail model structure objects, which generally results considering pixel-level objective functions during training. In this paper, we propose SceneNet overcome limitation aid semantic...
Decomposition of an image into multiple semantic components has been effective research topic for various processing applications such as denoising, enhancement, and inpainting. In this paper, we present a novel self-learning based decomposition framework. Based on the recent success sparse representation, proposed framework first learns over-complete dictionary from high spatial frequency parts input reconstruction purposes. We perform unsupervised clustering observed atoms (and their...
Anomaly detection has been an important research topic in data mining and machine learning. Many real-world applications such as intrusion or credit card fraud require effective efficient framework to identify deviated instances. However, most anomaly methods are typically implemented batch mode, thus cannot be easily extended large-scale problems without sacrificing computation memory requirements. In this paper, we propose online oversampling principal component analysis (osPCA) algorithm...
Multi-label classification is a practical yet challenging task in machine learning related fields, since it requires the prediction of more than one label category for each input instance. We propose novel deep neural networks (DNN) based model, Canonical Correlated AutoEncoder (C2AE), solving this task. Aiming at better relating feature and domain data improved classification, we uniquely perform joint embedding by deriving latent space, followed introduction label-correlation sensitive...
Point clouds are among the popular geometry representations for 3D vision applications. However, without regular structures like 2D images, processing and summarizing information over these unordered data points very challenging. Although a number of previous works attempt to analyze point achieve promising performances, their performances would degrade significantly when variations shift scale changes presented. In this paper, we propose Graph Convolution Networks (3D-GCN), which is...
Person re-identification (re-ID) aims at recognizing the same person from images taken across different cameras. To address this challenging task, existing re-ID models typically rely on a large amount of labeled training data, which is not practical for real-world applications. alleviate limitation, researchers now targets cross-dataset focuses generalizing discriminative ability to unlabeled target domain when given source dataset. achieve goal, our proposed Pose Disentanglement and...
While domain adaptation (DA) aims to associate the learning tasks across data domains, heterogeneous (HDA) particularly deals with from cross-domain which are of different types features. In other words, for HDA, source and target domains observed in separate feature spaces thus exhibit distinct distributions. this paper, we propose a novel algorithm Cross-Domain Landmark Selection (CDLS) solving above task. With goal deriving domain-invariant subspace our CDLS is able identify...
Diffusion models (DMs) have shown great potential for high-quality image synthesis. However, when it comes to producing images with complex scenes, how properly describe both global structures and object details remains a challenging task. In this paper, we present Frido, Feature Pyramid model performing multi-scale coarse-to-fine denoising process Our decomposes an input into scale-dependent vector quantized features, followed by gating output. During the above representation learning...
Among the widely used parameter-efficient finetuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods full fine-tuning (FT). In this work, we first introduce a novel weight decomposition analysis to investigate inherent differences FT LoRA. Aiming resemble learning capacity from findings, propose Weight-Decomposed LowRank Adaptation (DoRA). DoRA...
Cross-domain image synthesis and recognition are typically considered as two distinct tasks in the areas of computer vision pattern recognition. Therefore, it is not clear whether approaches addressing one task can be easily generalized or extended for solving other. In this paper, we propose a unified model coupled dictionary feature space learning. The proposed learning only observes common associating cross-domain data purposes, derived able to jointly update dictionaries each domain...
This paper presents a saliency-based video object extraction (VOE) framework. The proposed framework aims to automatically extract foreground objects of interest without any user interaction or the use training data (i.e., not limited particular type object). To separate and background regions within across frames, method utilizes visual motion saliency information extracted from input video. A conditional random field is applied effectively combine induced features, which allows us deal...
Learning-based approaches for image super-resolution (SR) have attracted the attention from researchers in past few years. In this paper, we present a novel self-learning approach SR. our proposed framework, advance support vector regression (SVR) with sparse representation, which offers excellent generalization modeling relationship between images and their associated SR versions. Unlike most prior methods, framework does not require collection of training low high-resolution data advance,...
Unsupervised domain adaptation deals with scenarios in which labeled data are available the source domain, but only unlabeled can be observed target domain. Since classifiers trained by source-domain would not expected to generalize well how transfer label information from target-domain is a challenging task. A common technique for unsupervised match cross-domain distributions, so that and distribution differences suppressed. In this paper, we propose utilize inferred while structural of...
Person re-identification (Re-ID) aims at recognizing the same person from images taken across different cameras. To address this task, one typically requires a large amount labeled data for training an effective Re-ID model, which might not be practical real-world applications. alleviate limitation, we choose to exploit sufficient of pre-existing (auxiliary) dataset. By jointly considering such auxiliary dataset and interest (but without label information), our proposed adaptation network...
We present a novel domain adaptation approach for solving cross-domain pattern recognition problems, i.e., the data or features to be processed and recognized are collected from different domains of interest. Inspired by canonical correlation analysis (CCA), we utilize derived subspace as joint representation associating across domains, advance reduced kernel techniques CCA (KCCA) if nonlinear desirable. Such not only makes KCCA computationally more efficient, potential over-fitting problems...
Rain removal from a single image is one of the challenging denoising problems. In this paper, we present learning-based framework for rain removal, which focuses on learning context information an input image, and thus patterns in it can be automatically identified removed. We approach problem as integration decomposition self-learning processes. More precisely, our method first performs context-constrained segmentation learn dictionaries high-frequency components different categories via...
While representation learning aims to derive interpretable features for describing visual data, disentanglement further results in such so that particular image attributes can be identified and manipulated. However, one cannot easily address this task without observing ground truth annotation the training data. To problem, we propose a novel deep model of Cross-Domain Representation Disentangler (CDRD). By fully annotated source-domain data unlabeled target-domain interest, our bridges...
Without any prior knowledge or user interaction, single image rain removal has been a challenging task. Typically, one needs to disregard components associated with the patterns, so that can be achieved via reconstruction. By observing limitations of standard batch-mode learning-based methods, we propose exploit structural similarity bases for solving this formulating basis selection as an optimization problem, are able those patterns while detailed information preserved. Experiments on both...
Audio-visual event localization requires one to identify the which is both visible and audible in a video (either at frame or level). To address this task, we propose deep neural network named Audio-Visual sequence-to-sequence dual (AVSDN). By jointly taking audio visual features each time segment as inputs, our proposed model learns global local information sequence manner, can be realized either fully supervised weakly settings. Empirical results confirm that method performs favorably...
We present a novel and unified deep learning framework which is capable of domain-invariant representation from data across multiple domains. Realized by adversarial training with additional ability to exploit domain-specific information, the proposed network able perform continuous cross-domain image translation manipulation, produces desirable output images accordingly. In addition, resulting feature exhibits superior performance unsupervised domain adaptation, also verifies effectiveness...