Jianke Zhu

ORCID: 0000-0003-1831-0106
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image and Video Retrieval Techniques
  • Advanced Vision and Imaging
  • Robotics and Sensor-Based Localization
  • Video Surveillance and Tracking Methods
  • Advanced Neural Network Applications
  • Image Retrieval and Classification Techniques
  • 3D Shape Modeling and Analysis
  • Face and Expression Recognition
  • Human Pose and Action Recognition
  • Computer Graphics and Visualization Techniques
  • Face recognition and analysis
  • Remote Sensing and LiDAR Applications
  • 3D Surveying and Cultural Heritage
  • Multimodal Machine Learning Applications
  • Video Analysis and Summarization
  • Medical Image Segmentation Techniques
  • Visual Attention and Saliency Detection
  • Text and Document Classification Technologies
  • Optical measurement and interference techniques
  • Generative Adversarial Networks and Image Synthesis
  • Anomaly Detection Techniques and Applications
  • Image Enhancement Techniques
  • Domain Adaptation and Few-Shot Learning
  • Remote-Sensing Image Classification
  • Caching and Content Delivery

Zhejiang University of Science and Technology
2017-2025

Zhejiang University
2015-2024

Second Affiliated Hospital of Zhejiang University
2024

Alibaba Group (China)
2018-2023

Singapore Management University
2020

Chinese University of Hong Kong
2005-2010

ETH Zurich
2009

University of Oxford
2009

The University of Queensland
2008

University of Macau
2004-2005

Learning effective feature representations and similarity measures are crucial to the retrieval performance of a content-based image (CBIR) system. Despite extensive research efforts for decades, it remains one most challenging open problems that considerably hinders successes real-world CBIR systems. The key challenge has been attributed well-known ``semantic gap'' issue exists between low-level pixels captured by machines high-level semantic concepts perceived human. Among various...

10.1145/2647868.2654948 article EN 2014-10-31

The goal of active learning is to select the most informative examples for manual labeling. Most previous studies in have focused on selecting a single unlabeled example each iteration. This could be inefficient since classification model has retrained every labeled example. In this paper, we present framework "batch mode learning" that applies Fisher information matrix number simultaneously. key computational challenge how efficiently identify subset can result largest reduction...

10.1145/1143844.1143897 article EN 2006-01-01

Most modern trackers typically employ a bounding box given in the first frame to track visual objects, where their tracking results are often sensitive initialization. In this paper, we propose new method, Reliable Patch Trackers (RPT), which attempts identify and exploit reliable patches that can be tracked effectively through whole process. Specifically, present reliability metric measure how reliably patch tracked, probability model is proposed estimate distribution of under sequential...

10.1109/cvpr.2015.7298632 article EN 2015-06-01

In contrast to the generic object, aerial targets are often non-axis aligned with arbitrary orientations having cluttered surroundings. Unlike mainstreamed approaches regressing bounding box orientations, this paper proposes an effective adaptive points learning approach object detection by taking advantage of representation, which is able capture geometric information arbitrary-oriented instances. To end, three oriented conversion functions presented facilitate classification and...

10.1109/cvpr52688.2022.00187 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Though deep learning-based object detection methods have achieved promising results on the conventional datasets, it is still challenging to locate objects from low-quality images captured in adverse weather conditions. The existing either difficulties balancing tasks of image enhancement and detection, or often ignore latent information beneficial for detection. To alleviate this problem, we propose a novel Image-Adaptive YOLO (IA-YOLO) framework, where each can be adaptively enhanced...

10.1609/aaai.v36i2.20072 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Active learning has been shown as a key technique for improving content-based image retrieval (CBIR) performance. Among various methods, support vector machine (SVM) active is popular its application to relevance feedback in CBIR. However, the regular SVM two main drawbacks when used feedback. First, often suffers from with small number of labeled examples, which case Second, usually does not take into account redundancy among and therefore could select multiple examples that are similar (or...

10.1109/cvpr.2008.4587350 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2008-06-01

Support vector machine (SVM) active learning is one popular and successful technique for relevance feedback in content-based image retrieval (CBIR). Despite the success, conventional SVM has two main drawbacks. First, performance of usually limited by number labeled examples. It often suffers a poor small-sized examples, which case feedback. Second, approaches do not take into account redundancy among could select multiple examples that are similar (or even identical). In this work, we...

10.1145/1508850.1508854 article EN ACM transactions on office information systems 2009-05-01

In computer vision and multimedia analysis, it is common to use multiple features (or multimodal features) represent an object. For example, well characterize a natural scene image, we typically extract set of visual its color, texture, shape. However, challenging integrate optimally. Since they are usually high-order correlated, e.g., the histogram gradient (HOG), bag scale invariant feature transform descriptors, wavelets closely related because collaboratively reflect image texture....

10.1109/tcyb.2013.2285219 article EN IEEE Transactions on Cybernetics 2013-11-19

With the exponential growth of Web 2.0 applications, tags have been used extensively to describe image contents on Web. Due noisy and sparse nature in human generated tags, how understand utilize these for retrieval tasks has become an emerging research direction. As low-level visual features can provide fruitful information, they are employed improve results. However, it is challenging bridge semantic gap between tags. To attack this critical problem, we propose a unified framework paper...

10.1109/tmm.2010.2051360 article EN IEEE Transactions on Multimedia 2010-06-03

10.1016/j.jvcir.2015.06.013 article EN Journal of Visual Communication and Image Representation 2015-07-04

With a good balance between tracking accuracy and speed, correlation filter (CF) has become one of the best object frameworks, based on which many successful trackers have been developed. Recently, spatially regularized CF (SRDCF) developed to remedy annoying boundary effects tracking, thus further boosting performance. However, SRDCF uses fixed spatial regularization map constructed from loose bounding box its performance inevitably degrades when target or background show significant...

10.1109/tip.2019.2895411 article EN IEEE Transactions on Image Processing 2019-01-25

Due to the popularity of service-oriented architectures for various distributed systems, an increasing number Web services have been deployed all over world. Recently, service recommendation became a hot research topic, one that aims accurately predict quality functional satisfactory each end user. Generally, performance changes time due variations status and network conditions. Instead employing conventional temporal models, we propose novel spatial-temporal QoS prediction approach...

10.1145/2801164 article EN ACM Transactions on the Web 2016-02-08

Recent years have witnessed an unprecedented growing of sport videos, as different types sports activities can be widely-observed (i.e., from professional athletics to personal fitness). Existing approaches by computer vision predominantly focused on creating experiences content browsing and searching video tagging summarization. These techniques already enabled a wide-range applications for enthusiasts, such text-based search, highlight generation, so on. In this paper, we take one step...

10.1145/3343031.3350910 article EN Proceedings of the 30th ACM International Conference on Multimedia 2019-10-15

Most of existing correlation filter-based tracking approaches only estimate simple axis-aligned bounding boxes, and very few them is capable recovering the underlying similarity transformation. To tackle this challenging problem, in paper, we propose a new tracker with novel robust estimation transformation on large displacements. In order to efficiently search such 4-DoF space real-time, formulate problem into two 2-DoF sub-problems apply an efficient Block Coordinates Descent solver...

10.1609/aaai.v33i01.33018666 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

With the development of advanced driver assistance systems~(ADAS) and autonomous vehicles, conducting experiments in various scenarios becomes an urgent need. Although having been capable synthesizing photo-realistic street scenes, conventional image-to-image translation methods cannot produce coherent scenes due to lack 3D information. In this paper, a large-scale neural rendering method is proposed synthesize driving scene~(READ), which makes it possible generate real time on PC through...

10.1609/aaai.v37i2.25238 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Semantic segmentation on driving-scene images is vital for autonomous driving. Although encouraging performance has been achieved daytime images, the nighttime are less satisfactory due to insufficient exposure and lack of labeled data. To address these issues, we present an add-on module called dual image-adaptive learnable filters (DIAL-Filters) improve semantic in driving conditions, aiming at exploiting intrinsic features under different illuminations. DIAL-Filters consist two parts,...

10.1109/tcsvt.2023.3260240 article EN IEEE Transactions on Circuits and Systems for Video Technology 2023-03-22

In contrast to fully supervised methods using pixel-wise mask labels, box-supervised instance segmentation takes advantage of simple box annotations, which has recently attracted increasing research attention. This paper presents a novel single-shot approach, namely Box2Mask, integrates the classical level-set evolution model into deep neural network learning achieve accurate prediction with only bounding supervision. Specifically, both input image and its features are employed evolve curves...

10.1109/tpami.2024.3363054 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-02-06

10.1109/cvpr52733.2024.02664 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Near-duplicate image retrieval plays an important role in many real-world multimedia applications. Most previous approaches have some limitations. For example, conventional appearance-based methods may suffer from the illumination variations and occlusion issue, local feature correspondence-based often do not consider deformations spatial coherence between two point sets. In this paper, we propose a novel effective Nonrigid Image Matching (NIM) approach to tackle task of near-duplicate...

10.1145/1459359.1459366 article EN Proceedings of the 30th ACM International Conference on Multimedia 2008-10-26

In this paper, we study the effective semi-supervised hashing method under framework of regularized learning-based hashing. A nonlinear hash function is introduced to capture underlying relationship among data points. Thus, dimensionality matrix for computation not only independent from original space but also much smaller than one using linear function. To effectively deal with error accumulated during converting real-value embeddings into binary code after relaxation, propose a algorithm...

10.1109/tkde.2012.76 article EN IEEE Transactions on Knowledge and Data Engineering 2012-04-12

Automated photo tagging is essential to make massive unlabeled photos searchable by text search engines. Conventional image annotation approaches, though working reasonably well on small testbeds, are either computationally expensive or inaccurate when dealing with large-scale tagging. Recently, the popularity of social networking websites, we observe a number user-tagged images, referred as that available web. Unlike traditional web images often contain tags and other user-generated...

10.1145/1631272.1631293 article EN Proceedings of the 30th ACM International Conference on Multimedia 2009-10-19

The parsing of building facades is a key component to the problem 3D street scenes reconstruction, which long desired in computer vision. In this paper, we propose deep learning based method for segmenting facade into semantic categories. Man-made structures often present characteristic symmetry. Based on observation, symmetric regularizer training neural network. Our proposed can make use both power networks and structure man-made architectures. We also refine segmentation results using...

10.24963/ijcai.2017/320 article EN 2017-07-28
Coming Soon ...