- Advanced Image and Video Retrieval Techniques
- Advanced Vision and Imaging
- Robotics and Sensor-Based Localization
- Video Surveillance and Tracking Methods
- Advanced Neural Network Applications
- Image Retrieval and Classification Techniques
- 3D Shape Modeling and Analysis
- Face and Expression Recognition
- Human Pose and Action Recognition
- Computer Graphics and Visualization Techniques
- Face recognition and analysis
- Remote Sensing and LiDAR Applications
- 3D Surveying and Cultural Heritage
- Multimodal Machine Learning Applications
- Video Analysis and Summarization
- Medical Image Segmentation Techniques
- Visual Attention and Saliency Detection
- Text and Document Classification Technologies
- Optical measurement and interference techniques
- Generative Adversarial Networks and Image Synthesis
- Anomaly Detection Techniques and Applications
- Image Enhancement Techniques
- Domain Adaptation and Few-Shot Learning
- Remote-Sensing Image Classification
- Caching and Content Delivery
Zhejiang University of Science and Technology
2017-2025
Zhejiang University
2015-2024
Second Affiliated Hospital of Zhejiang University
2024
Alibaba Group (China)
2018-2023
Singapore Management University
2020
Chinese University of Hong Kong
2005-2010
ETH Zurich
2009
University of Oxford
2009
The University of Queensland
2008
University of Macau
2004-2005
Learning effective feature representations and similarity measures are crucial to the retrieval performance of a content-based image (CBIR) system. Despite extensive research efforts for decades, it remains one most challenging open problems that considerably hinders successes real-world CBIR systems. The key challenge has been attributed well-known ``semantic gap'' issue exists between low-level pixels captured by machines high-level semantic concepts perceived human. Among various...
The goal of active learning is to select the most informative examples for manual labeling. Most previous studies in have focused on selecting a single unlabeled example each iteration. This could be inefficient since classification model has retrained every labeled example. In this paper, we present framework "batch mode learning" that applies Fisher information matrix number simultaneously. key computational challenge how efficiently identify subset can result largest reduction...
Most modern trackers typically employ a bounding box given in the first frame to track visual objects, where their tracking results are often sensitive initialization. In this paper, we propose new method, Reliable Patch Trackers (RPT), which attempts identify and exploit reliable patches that can be tracked effectively through whole process. Specifically, present reliability metric measure how reliably patch tracked, probability model is proposed estimate distribution of under sequential...
In contrast to the generic object, aerial targets are often non-axis aligned with arbitrary orientations having cluttered surroundings. Unlike mainstreamed approaches regressing bounding box orientations, this paper proposes an effective adaptive points learning approach object detection by taking advantage of representation, which is able capture geometric information arbitrary-oriented instances. To end, three oriented conversion functions presented facilitate classification and...
Though deep learning-based object detection methods have achieved promising results on the conventional datasets, it is still challenging to locate objects from low-quality images captured in adverse weather conditions. The existing either difficulties balancing tasks of image enhancement and detection, or often ignore latent information beneficial for detection. To alleviate this problem, we propose a novel Image-Adaptive YOLO (IA-YOLO) framework, where each can be adaptively enhanced...
Active learning has been shown as a key technique for improving content-based image retrieval (CBIR) performance. Among various methods, support vector machine (SVM) active is popular its application to relevance feedback in CBIR. However, the regular SVM two main drawbacks when used feedback. First, often suffers from with small number of labeled examples, which case Second, usually does not take into account redundancy among and therefore could select multiple examples that are similar (or...
Support vector machine (SVM) active learning is one popular and successful technique for relevance feedback in content-based image retrieval (CBIR). Despite the success, conventional SVM has two main drawbacks. First, performance of usually limited by number labeled examples. It often suffers a poor small-sized examples, which case feedback. Second, approaches do not take into account redundancy among could select multiple examples that are similar (or even identical). In this work, we...
In computer vision and multimedia analysis, it is common to use multiple features (or multimodal features) represent an object. For example, well characterize a natural scene image, we typically extract set of visual its color, texture, shape. However, challenging integrate optimally. Since they are usually high-order correlated, e.g., the histogram gradient (HOG), bag scale invariant feature transform descriptors, wavelets closely related because collaboratively reflect image texture....
With the exponential growth of Web 2.0 applications, tags have been used extensively to describe image contents on Web. Due noisy and sparse nature in human generated tags, how understand utilize these for retrieval tasks has become an emerging research direction. As low-level visual features can provide fruitful information, they are employed improve results. However, it is challenging bridge semantic gap between tags. To attack this critical problem, we propose a unified framework paper...
With a good balance between tracking accuracy and speed, correlation filter (CF) has become one of the best object frameworks, based on which many successful trackers have been developed. Recently, spatially regularized CF (SRDCF) developed to remedy annoying boundary effects tracking, thus further boosting performance. However, SRDCF uses fixed spatial regularization map constructed from loose bounding box its performance inevitably degrades when target or background show significant...
Due to the popularity of service-oriented architectures for various distributed systems, an increasing number Web services have been deployed all over world. Recently, service recommendation became a hot research topic, one that aims accurately predict quality functional satisfactory each end user. Generally, performance changes time due variations status and network conditions. Instead employing conventional temporal models, we propose novel spatial-temporal QoS prediction approach...
Recent years have witnessed an unprecedented growing of sport videos, as different types sports activities can be widely-observed (i.e., from professional athletics to personal fitness). Existing approaches by computer vision predominantly focused on creating experiences content browsing and searching video tagging summarization. These techniques already enabled a wide-range applications for enthusiasts, such text-based search, highlight generation, so on. In this paper, we take one step...
Most of existing correlation filter-based tracking approaches only estimate simple axis-aligned bounding boxes, and very few them is capable recovering the underlying similarity transformation. To tackle this challenging problem, in paper, we propose a new tracker with novel robust estimation transformation on large displacements. In order to efficiently search such 4-DoF space real-time, formulate problem into two 2-DoF sub-problems apply an efficient Block Coordinates Descent solver...
With the development of advanced driver assistance systems~(ADAS) and autonomous vehicles, conducting experiments in various scenarios becomes an urgent need. Although having been capable synthesizing photo-realistic street scenes, conventional image-to-image translation methods cannot produce coherent scenes due to lack 3D information. In this paper, a large-scale neural rendering method is proposed synthesize driving scene~(READ), which makes it possible generate real time on PC through...
Semantic segmentation on driving-scene images is vital for autonomous driving. Although encouraging performance has been achieved daytime images, the nighttime are less satisfactory due to insufficient exposure and lack of labeled data. To address these issues, we present an add-on module called dual image-adaptive learnable filters (DIAL-Filters) improve semantic in driving conditions, aiming at exploiting intrinsic features under different illuminations. DIAL-Filters consist two parts,...
In contrast to fully supervised methods using pixel-wise mask labels, box-supervised instance segmentation takes advantage of simple box annotations, which has recently attracted increasing research attention. This paper presents a novel single-shot approach, namely Box2Mask, integrates the classical level-set evolution model into deep neural network learning achieve accurate prediction with only bounding supervision. Specifically, both input image and its features are employed evolve curves...
Near-duplicate image retrieval plays an important role in many real-world multimedia applications. Most previous approaches have some limitations. For example, conventional appearance-based methods may suffer from the illumination variations and occlusion issue, local feature correspondence-based often do not consider deformations spatial coherence between two point sets. In this paper, we propose a novel effective Nonrigid Image Matching (NIM) approach to tackle task of near-duplicate...
In this paper, we study the effective semi-supervised hashing method under framework of regularized learning-based hashing. A nonlinear hash function is introduced to capture underlying relationship among data points. Thus, dimensionality matrix for computation not only independent from original space but also much smaller than one using linear function. To effectively deal with error accumulated during converting real-value embeddings into binary code after relaxation, propose a algorithm...
Automated photo tagging is essential to make massive unlabeled photos searchable by text search engines. Conventional image annotation approaches, though working reasonably well on small testbeds, are either computationally expensive or inaccurate when dealing with large-scale tagging. Recently, the popularity of social networking websites, we observe a number user-tagged images, referred as that available web. Unlike traditional web images often contain tags and other user-generated...
The parsing of building facades is a key component to the problem 3D street scenes reconstruction, which long desired in computer vision. In this paper, we propose deep learning based method for segmenting facade into semantic categories. Man-made structures often present characteristic symmetry. Based on observation, symmetric regularizer training neural network. Our proposed can make use both power networks and structure man-made architectures. We also refine segmentation results using...