- Advanced Image and Video Retrieval Techniques
- Domain Adaptation and Few-Shot Learning
- Image Retrieval and Classification Techniques
- Advanced Vision and Imaging
- Advanced Neural Network Applications
- Medical Image Segmentation Techniques
- Human Pose and Action Recognition
- Video Surveillance and Tracking Methods
- Face and Expression Recognition
- Multimodal Machine Learning Applications
- Video Analysis and Summarization
- Anomaly Detection Techniques and Applications
- Image Processing Techniques and Applications
- Image and Object Detection Techniques
- Image Enhancement Techniques
- Visual Attention and Saliency Detection
- COVID-19 diagnosis using AI
- 3D Shape Modeling and Analysis
- Music and Audio Processing
- Remote-Sensing Image Classification
- Network Security and Intrusion Detection
- Face recognition and analysis
- Sparse and Compressive Sensing Techniques
- AI in cancer detection
- Advanced Image Processing Techniques
Institute of Information Science, Academia Sinica
2015-2024
Academia Sinica
2005-2022
Institute of Information Science
2018
Institute of Statistical Science, Academia Sinica
1999-2013
New York University
1995-2002
Courant Institute of Mathematical Sciences
1997-2002
University of Minnesota
1996
Twin Cities Orthopedics
1996
We present a new approach, called local discriminant embedding (LDE), to manifold learning and pattern classification. In our framework, the neighbor class relations of data are used construct for classification problems. The proposed algorithm learns submanifold each by solving an optimization problem. After being embedded into low-dimensional subspace, points same maintain their intrinsic relations, whereas neighboring different classes no longer stick one another. Via embedding, test thus...
We present a novel computational model to explore the relatedness of objectness and saliency, each which plays an important role in study visual attention. The proposed framework conceptually integrates these two concepts via constructing graphical account for their relationships, concurrently improves estimation by iteratively optimizing energy function realizing model. Specifically, comprises objectness, interaction energy, respectively corresponding explain individual regularities mutual...
Automatic saliency prediction in 360° videos is critical for viewpoint guidance applications (e.g., Facebook 360 Guide). We propose a spatial-temporal network which (1) weakly-supervised trained and (2) tailor-made viewing sphere. Note that most existing methods are less scalable since they rely on annotated map training. Most importantly, convert sphere to 2D images single equirectangular image or multiple separate Normal Field-of-View (NFoV) images) introduces distortion boundaries. In...
In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. The resulting representations are typically high-dimensional and assume diverse forms. Hence, finding of transforming them into unified space lower dimension generally facilitates underlying tasks such as object recognition or clustering. To this end, proposed approach (termed MKL-DR) generalizes framework kernel dimensionality...
We address two key issues of co-segmentation over multiple images. The first is whether a pure unsupervised algorithm can satisfactorily solve this problem. Without the user's guidance, segmenting foregrounds implied by common object quite challenging task, especially when substantial variations in object's appearance, shape, and scale are allowed. second issue concerns efficiency if technique lead to practical uses. With these mind, we establish an MRF optimization model that has energy...
Motivated by the conventional grouping techniques to image segmentation, we develop their DNN counterpart tackle referring variant. The proposed method is driven a convolutional-recurrent neural network (ConvRNN) that iteratively carries out top-down processing of bottom-up segmentation cues. Given natural language expression, our learns predict its relevance each pixel and derives See-through-Text Embedding Pixelwise (STEP) heatmap, which reveals cues level via learned visual-textual...
This paper aims to tackle the challenging problem of one-shot object detection. Given a query image patch whose class label is not included in training data, goal task detect all instances same target image. To this end, we develop novel {\em co-attention and co-excitation} (CoAE) framework that makes contributions three key technical aspects. First, propose use non-local operation explore embodied each query-target pair yield region proposals accounting for situation. Second, formulate...
This paper addresses a new task called referring 3D instance segmentation, which aims to segment out the target in scene given query sentence. Previous work on understanding has explored visual grounding with natural language guidance, yet emphasis is mostly constrained images and videos. We propose Text-guided Graph Neural Network (TGNN) for segmentation point clouds. Given sentence cloud of scene, our method learns extract per-point features predicts an offset shift each toward its object...
Learning to capture human motion is essential 3D pose and shape estimation from monocular video. However, the existing methods mainly rely on recurrent or convolutional operation model such temporal information, which limits ability non-local context relations of motion. To address this problem, we propose a network (MPS-Net) effectively humans in estimate accurate temporally coherent Specifically, first continuity attention (MoCA) module that leverages visual cues observed adaptively...
We present a framework for 2D shape contour (silhouette) comparison that can account stretchings, occlusions and region information. Topological changes due to the original 3D scenarios articulations are also addressed. To compare degree of similarity between any two shapes, our approach is represent each with free tree structure derived from axis (SA) model, which we have recently proposed. then use matching scheme find best approximate match cost. deal articulations, stretchings...
To model a scene for background subtraction, Gaussian mixture modeling (GMM) is popular choice its capability of adaptation to variations. However, GMM often suffers from tradeoff between robustness changes and sensitivity foreground abnormalities inefficient in managing the various surveillance scenarios. By reviewing formulations GMM, we identify that such can be easily controlled by adaptive adjustments GMM's learning rates image pixels at different locations distinct properties. A new...
We address the problem of contour detection via per-pixel classifications edge point. To facilitate process, proposed approach leverages with DenseNet, an efficient implementation multiscale convolutional neural networks (CNNs), to extract informative feature vector for each pixel and uses SVM classifier accomplish detection. In experiment detection, we look into effectiveness combining features from different CNN layers verify their performance on BSDS500.
One-shot object detection tackles a challenging task that aims at identifying within target image all instances of the same class, implied by query patch. The main difficulty lies in situation class label patch and its respective examples are not available training data. Our idea leverages concept language translation to boost metric-learning-based methods. Specifically, we emulate process adaptively translate feature each proposal better correlate given for discriminating class-similarity...
Anomaly detection (AD) aims to address the task of classification or localization image anomalies. This paper addresses two pivotal issues reconstruction-based approaches AD in images, namely, model adaptation and reconstruction gap. The former generalizes an tackling a broad range object categories, while latter provides useful clues for localizing abnormal regions. At core our method is unsupervised universal model, termed as Metaformer, which leverages both meta-learned parameters achieve...
Optimization methods based on iterative schemes can be divided into two classes: line-search and trust-region methods. While techniques are commonly found in various vision applications, not much attention is paid to ones. Motivated by the fact that considered as special cases of methods, we propose establish a framework for real-time tracking. Our approach characterized three key contributions. First, since tracking system more effective, it often yields better performances than outcomes...
Learning the user's semantics for CBIR involves two different sources of information: similarity relations entailed by content-based features, and relevance specified in feedback. Given that, we propose an augmented relation embedding (ARE) to map image space into a semantic manifold that faithfully grasps preferences. Besides ARE, also look issues selecting good feature set improving retrieval performance. With these aspects efforts have established system yields far better results than...
This paper describes a local ensemble kernel learning technique to recognize/classify objects from large number of diverse categories. Due the possibly intraclass feature variations, using only single unified kernel-based classifier may not satisfactorily solve problem. Our approach is carry out recognition task with adaptive machines, each which derived proper localization and regularization. Specifically, for training sample, we learn distinct constructed in way give good classification...
<para xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> Learning to efficiently construct a scene background model is crucial for tracking techniques relying on subtraction. Our proposed method motivated by criteria leading what general and reasonable should be, realized practical classification technique. Specifically, we consider two-level approximation scheme that elegantly combines the bottom-up top-down information deriving in real time. The key...
Representing shapes in a compact and informative form is significant problem for vision systems that must recognize or classify objects. We describe representation model two-dimensional (2D) by investigating their self-similarities constructing shape axis trees (SA-trees). Our approach can be formulated as variational one (or, equivalently, MAP estimation of Markov random field). start with 2D shape, its boundary contour, two different parameterizations the contour (one parameterization...
We describe a tracking algorithm to address the interactions among objects, and track them individually confidently via static camera. It is achieved by constructing an invariant bipartite graph model dynamics of process, which nodes are classified into objects profiles. The best match corresponds optimal assignment for resolving identities detected objects. Since may enter/exit scene indefinitely, or when occur/conclude they could form/leave group, number in changes dynamically. Therefore...
This paper presents a novel method for instance segmentation of 3D point clouds. The proposed is called Gaussian Instance Center Network (GICN), which can approximate the distributions centers scattered in whole scene as center heatmaps. Based on predicted heatmaps, small number candidates be easily selected subsequent predictions with efficiency, including i) predicting size each to decide range extracting features, ii) generating bounding boxes centers, and iii) producing final masks. GICN...
We introduce a comprehensive screening platform for the COVID-19 (a.k.a., SARS-CoV-2) pneumonia. The proposed AI-based system works on chest x-ray (CXR) images to predict whether patient is infected with disease. Although recent international joint effort making availability of all sorts open data, public collection CXR still relatively small reliably training deep neural network (DNN) carry out prediction. To better address such inefficiency, we design cascaded learning strategy improve...
This paper describes a novel graphical model approach to seamlessly coupling and simultaneously analyzing facial emotions the action units. Our method is based on hidden conditional random fields (HCRFs) where we link output class label underlying emotion of expression sequence, connect variables image frame-wise As HCRFs are formulated with only clique constraints, their labeling for often lacks coherent meaningful configuration. We resolve this matter by introducing partially-observed HCRF...