- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Video Surveillance and Tracking Methods
- Human Pose and Action Recognition
- Visual Attention and Saliency Detection
- Advanced Image and Video Retrieval Techniques
- Multimodal Machine Learning Applications
- Retinal Imaging and Analysis
- 3D Shape Modeling and Analysis
- Anomaly Detection Techniques and Applications
- Adversarial Robustness in Machine Learning
- Face recognition and analysis
- Advanced Vision and Imaging
- Remote Sensing and LiDAR Applications
- Image Enhancement Techniques
- Radiomics and Machine Learning in Medical Imaging
- Image and Video Quality Assessment
- Generative Adversarial Networks and Image Synthesis
- COVID-19 diagnosis using AI
- Advanced Image Processing Techniques
- Olfactory and Sensory Function Studies
- 3D Surveying and Cultural Heritage
- Autonomous Vehicle Technology and Safety
- Brain Tumor Detection and Classification
- Imbalanced Data Classification Techniques
Beijing Institute of Technology
2011-2024
University of Macau
2021-2024
City University of Macau
2021-2024
Inception Institute of Artificial Intelligence
2019-2021
ETH Zurich
2020-2021
Beijing Academy of Artificial Intelligence
2020
Zhejiang University of Technology
2002-2004
Coronavirus Disease 2019 (COVID-19) spread globally in early 2020, causing the world to face an existential health crisis. Automated detection of lung infections from computed tomography (CT) images offers a great potential augment traditional healthcare strategy for tackling COVID-19. However, segmenting infected regions CT slices faces several challenges, including high variation infection characteristics, and low intensity contrast between normal tissues. Further, collecting large amount...
As an essential problem in computer vision, salient object detection (SOD) has attracted increasing amount of research attention over the years. Recent advances SOD are predominantly led by deep learning-based solutions (named SOD). To enable in-depth understanding SOD, this paper, we provide a comprehensive survey covering various aspects, ranging from algorithm taxonomy to unsolved issues. In particular, first review algorithms different perspectives, including network architecture, level...
We present a comprehensive study on new task named camouflaged object detection (COD), which aims to identify objects that are "seamlessly" embedded in their surroundings. The high intrinsic similarities between the target and background make COD far more challenging than traditional task. To address this issue, we elaborately collect novel dataset, called COD10K, comprises 10,000 images covering various natural scenes, over 78 categories. All densely annotated with category, bounding-box,...
This paper presents a new method for detecting salient objects in images using convolutional neural networks (CNNs). The proposed network, named PAGE-Net, offers two key contributions. first is the exploitation of an essential pyramid attention structure object detection. enables network to concentrate more on regions while considering multi-scale saliency information. Such stacked design provides powerful tool efficiently improve representation ability corresponding layer with enlarged...
This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS). The suggested AGNN recasts this task as process of iterative information fusion over graphs. Specifically, builds fully connected to efficiently represent frames nodes, and relations between arbitrary frame pairs edges. underlying pair-wise are described by differentiable attention mechanism. Through parametric message passing, is able capture mine much richer higher-order frames,...
Magnetic resonance imaging (MRI) is a widely used neuroimaging technique that can provide images of different contrasts (i.e., modalities). Fusing this multi-modal data has proven particularly effective for boosting model performance in many tasks. However, due to poor quality and frequent patient dropout, collecting all modalities every remains challenge. Medical image synthesis been proposed as an solution, where any missing are synthesized from the existing ones. In paper, we propose...
Predicting where people look in static scenes, a.k.a visual saliency, has received significant research interest recently. However, relatively less effort been spent understanding and modeling attention over dynamic scenes. This work makes three contributions to video saliency research. First, we introduce a new benchmark, called DHF1K (Dynamic Human Fixation 1K), for predicting fixations during scene free-viewing, which is long-time need this field. consists of 1K high-quality...
This paper conducts a systematic study on the role of visual attention in Unsupervised Video Object Segmentation (UVOS) tasks. By elaborately annotating three popular video segmentation datasets (DAVIS, Youtube-Objects and SegTrack V2) with dynamic eye-tracking data UVOS setting, for first time, we quantitatively verified high consistency behavior among human observers, found strong correlation between explicit primary object judgements during dynamic, task-driven viewing. Such novel...
Matching person images between the daytime visible modality and night-time infrared (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. Existing methods usually learn multi-modality features in raw image, ignoring image-level discrepancy. Some apply GAN technique to generate images, but it destroys local structure introduces unavoidable noise. In this paper, we propose Homogeneous Augmented Tri-Modal (HAT) learning method for VI-ReID, where an auxiliary grayscale...
Previous research in visual saliency has been focused on two major types of models namely fixation prediction and salient object detection. The relationship between the two, however, less explored. In this work, we propose to employ former model type identify objects. We build a novel Attentive Saliency Network (ASNet)1 1.Available at: https://github.com/wenguanwang/ASNet. that learns detect objects from fixations. map, derived at upper network layers, mimics human attention mechanisms...
In this paper, we present a novel end-to-end learning neural network, i.e., MATNet, for zero-shot video object segmentation (ZVOS). Motivated by the human visual attention behavior, MATNet leverages motion cues as bottom-up signal to guide perception of appearance. To achieve this, an asymmetric block, named Motion-Attentive Transition (MAT), is proposed within two-stream encoder network firstly identify moving regions and then attend appearance capture full extent objects. Putting MATs in...
Visible thermal person re-identification (VT-ReID) is a challenging cross-modality pedestrian retrieval problem due to the large intra-class variations and modality discrepancy across different cameras. Existing VT-ReID methods mainly focus on learning sharable feature representations by handling modality-discrepancy in level. However, difference classifier level has received much less attention, resulting limited discriminability. In this paper, we propose novel modality-aware collaborative...
Visual tracking addresses the problem of localizing an arbitrary target in video according to annotated bounding box. In this article, we present a novel method by introducing attention mechanism into Siamese network increase its matching discrimination. We propose new way compute weights improve performance sub-Siamese [Attention Net (A-Net)], which locates attentive parts for solving searching problem. addition, features higher layers can preserve more semantic information while lower...
Learning unbiased models on imbalanced datasets is a significant challenge. Rare classes tend to get concentrated representation in the classification space which hampers generalization of learned boundaries new test examples. In this paper, we demonstrate that Bayesian uncertainty estimates directly correlate with rarity and difficulty level individual samples. Subsequently, present novel framework for based class imbalance learning follows two key insights: First, should be extended...
Convolutional Neural Networks have achieved significant success across multiple computer vision tasks. However, they are vulnerable to carefully crafted, human-imperceptible adversarial noise patterns which constrain their deployment in critical security-sensitive systems. This paper proposes a computationally efficient image enhancement approach that provides strong defense mechanism effectively mitigate the effect of such perturbations. We show deep restoration networks learn mapping...
Cross-modality person re-identification is a challenging task due to large cross-modality discrepancy and intramodality variations. Currently, most existing methods focus on learning modality-specific or modality-shareable features by using the identity supervision modality label. Different from methods, this paper presents novel Modality Confusion Learning Network (MCLNet). Its basic idea confuse two modalities, ensuring that optimization explicitly concentrated modality-irrelevant...
In this article, we model a set of pixelwise object segmentation tasks - automatic video (AVS), image co-segmentation (ICS) and few-shot semantic (FSS) in unified view segmenting objects from relational visual data. To end, propose an attentive graph neural network (AGNN) that addresses these holistic fashion, by formulating them as process iterative information fusion over data graphs. It builds fully-connected to efficiently represent nodes relations between instances edges. The underlying...
In recent years, Siamese network based trackers have significantly advanced the state-of-the-art in real-time tracking. Despite their success, tend to suffer from high memory costs, which restrict applicability mobile devices with tight budgets. To address this issue, we propose a distilled tracking framework learn small, fast and accurate (students), capture critical knowledge large (teachers) by teacher-students distillation model. This model is intuitively inspired one teacher versus...
This paper proposes a novel residual attentive learning network architecture for predicting dynamic eye-fixation maps. The proposed model emphasizes two essential issues, i.e., effective spatiotemporal feature integration and multi-scale saliency learning. For the first problem, appearance motion streams are tightly coupled via dense cross connections, which integrate information with multi-layer, comprehensive features in way. Beyond traditional two-stream models separately, such design...
Existing LiDAR-based 3D object detectors usually focus on the single-frame detection, while ignoring spatiotemporal information in consecutive point cloud frames. In this paper, we propose an end-to-end online video detector that operates sequences. The proposed model comprises a spatial feature encoding component and aggregation component. former component, novel Pillar Message Passing Network (PMPNet) is to encode each discrete frame. It adaptively collects for pillar node from its...
Deep embedding learning plays a key role in discriminative feature representations, where the visually similar samples are pulled closer and dissimilar pushed away low-dimensional space. This paper studies unsupervised problem by such representation without using any category labels. task faces two primary challenges: mining reliable positive supervision from highly fine-grained classes, generalizing to unseen testing categories. To approximate concentration negative separation properties...