- Advanced Image and Video Retrieval Techniques
- Video Surveillance and Tracking Methods
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Advanced Vision and Imaging
- Advanced Neural Network Applications
- Human Pose and Action Recognition
- Face and Expression Recognition
- Advanced Image Processing Techniques
- Video Analysis and Summarization
- Image Retrieval and Classification Techniques
- Topic Modeling
- Robotics and Sensor-Based Localization
- Anomaly Detection Techniques and Applications
- Generative Adversarial Networks and Image Synthesis
- Image and Signal Denoising Methods
- Face recognition and analysis
- Natural Language Processing Techniques
- Image Processing Techniques and Applications
- Text and Document Classification Technologies
- Advanced Image Fusion Techniques
- Image Enhancement Techniques
- Sparse and Compressive Sensing Techniques
- Adversarial Robustness in Machine Learning
- Distributed and Parallel Computing Systems
Changshu Institute of Technology
2019-2025
Chinese Academy of Sciences
2016-2025
University of Technology Sydney
2015-2025
Henan University of Science and Technology
2010-2025
Fudan University Shanghai Cancer Center
2024-2025
North University of China
2014-2025
Southern University of Science and Technology
2017-2025
University of Shanghai for Science and Technology
2025
Tianjin Medical University Eye Hospital
2025
Tianjin Medical University
2025
Face recognition has made extraordinary progress owing to the advancement of deep convolutional neural networks (CNNs). The central task face recognition, including verification and identification, involves feature discrimination. However, traditional softmax loss CNNs usually lacks power To address this problem, recently several functions such as center loss, large margin angular have been proposed. All these improved losses share same idea: maximizing inter-class variance minimizing...
In this work, we study 3D object detection from RGBD data in both indoor and outdoor scenes. While previous methods focus on images or voxels, often obscuring natural patterns invariances of data, directly operate raw point clouds by popping up RGB-D scans. However, a key challenge approach is how to efficiently localize objects large-scale scenes (region proposal). Instead solely relying proposals, our method leverages mature 2D detectors advanced deep learning for localization, achieving...
Visual attention has been successfully applied in structural prediction tasks such as visual captioning and question answering. Existing models are generally spatial, i.e., the is modeled spatial probabilities that re-weight last conv-layer feature map of a CNN encoding an input image. However, we argue does not necessarily conform to mechanism - dynamic extractor combines contextual fixations over time, features naturally channel-wise multi-layer. In this paper, introduce novel...
The main contribution of this paper is an approach for introducing additional context into state-of-the-art general object detection. To achieve we first combine a classifier (Residual-101[14]) with fast detection framework (SSD[18]). We then augment SSD+Residual-101 deconvolution layers to introduce large-scale in and improve accuracy, especially small objects, calling our resulting system DSSD deconvolutional single shot detector. While these two contributions are easily described at...
Recent years have witnessed the growing popularity of hashing in large-scale vision problems. It has been shown that quality could be boosted by leveraging supervised information into hash function learning. However, existing methods either lack adequate performance or often incur cumbersome model training. In this paper, we propose a novel kernel-based which requires limited amount information, i.e., similar and dissimilar data pairs, feasible training cost achieving high hashing. The idea...
Recently, learning based hashing techniques have attracted broad research interests because they can support efficient storage and retrieval for high-dimensional data such as images, videos, documents, etc. However, a major difficulty of to hash lies in handling the discrete constraints imposed on pursued codes, which typically makes optimizations very challenging (NP-hard general). In this work, we propose new supervised framework, where objective is generate optimal binary codes linear...
Deep learning-based object detection and instance segmentation have achieved unprecedented progress. In this article, we propose complete-IoU (CIoU) loss Cluster-NMS for enhancing geometric factors in both bounding-box regression nonmaximum suppression (NMS), leading to notable gains of average precision (AP) recall (AR), without the sacrifice inference efficiency. particular, consider three factors, that is: 1) overlap area; 2) normalized central-point distance; 3) aspect ratio, which are...
In this paper, we consider the Tensor Robust Principal Component Analysis (TRPCA) problem, which aims to exactly recover low-rank and sparse components from their sum. Our model is based on recently proposed tensor-tensor product (or t-product) [14]. Induced by t-product, first rigorously deduce tensor spectral norm, nuclear average rank, show that norm convex envelope of rank within unit ball norm. These definitions, relationships properties are consistent with matrix cases. Equipped new...
In this paper, we propose an efficient algorithm to directly restore a clear image from hazy input. The proposed hinges on end-to-end trainable neural network that consists of encoder and decoder. is exploited capture the context derived input images, while decoder employed estimate contribution each final dehazed result using learned representations attributed encoder. constructed adopts novel fusion-based strategy which derives three inputs original by applying White Balance (WB), Contrast...
Multimedia content is dominating today's Web information. The nature of multimedia user-item interactions 1/0 binary implicit feedback (e.g., photo likes, video views, song downloads, etc.), which can be collected at a larger scale with much lower cost than explicit product ratings). However, the majority existing collaborative filtering (CF) systems are not well-designed for recommendation, since they ignore implicitness in users' content. We argue that, there exists item- and...
The explosive growth in Big Data has attracted much attention designing efficient indexing and search methods recently. In many critical applications such as large-scale pattern matching, finding the nearest neighbors to a query is fundamental research problem. However, straightforward solution using exhaustive comparison infeasible due prohibitive computational complexity memory requirement. response, approximate neighbor (ANN) based on hashing techniques become popular its promising...
This paper studies the Tensor Robust Principal Component (TRPCA) problem which extends known PCA [4] to tensor case. Our model is based on a new Singular Value Decomposition (t-SVD) [14] and its induced tubal rank nuclear norm. Consider that we have 3-way X ε R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n1×n2×n3</sup> such = L <sub xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> + S , where has low...
We propose to compose dynamic tree structures that place the objects in an image into a visual context, helping reasoning tasks such as scene graph generation and Q&A. Our context model, dubbed VCTree, has two key advantages over existing structured object representations including chains fully-connected graphs: 1) The efficient expressive binary encodes inherent parallel/hierarchical relationships among objects, e.g., ``clothes'' ``pants'' are usually co-occur belong ``person''; 2)...
Thanks to the success of deep learning, cross-modal retrieval has made significant progress recently. However, there still remains a crucial bottleneck: how bridge modality gap further enhance accuracy. In this paper, we propose self-supervised adversarial hashing (SSAH) approach, which lies among early attempts incorporate learning into in fashion. The primary contribution work is that two networks are leveraged maximize semantic correlation and consistency representations between different...
We propose an unsupervised visual tracking method in this paper. Different from existing approaches using extensive annotated data for supervised learning, our CNN model is trained on large-scale unlabeled videos manner. Our motivation that a robust tracker should be effective both the forward and backward predictions (i.e., can localize target object successive frames backtrace to its initial position first frame). build framework Siamese correlation filter network, which raw videos....
Adam and RMSProp are two of the most influential adaptive stochastic algorithms for training deep neural networks, which have been pointed out to be divergent even in convex setting via a few simple counterexamples. Many attempts, such as decreasing an learning rate, adopting big batch size, incorporating temporal decorrelation technique, seeking analogous surrogate, etc., tried promote Adam/RMSProp-type converge. In contrast with existing approaches, we introduce alternative easy-to-check...