- Advanced Image and Video Retrieval Techniques
- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Video Surveillance and Tracking Methods
- Human Pose and Action Recognition
- Multimodal Machine Learning Applications
- Visual Attention and Saliency Detection
- Advanced Image Processing Techniques
- Image Retrieval and Classification Techniques
- Image and Signal Denoising Methods
- Anomaly Detection Techniques and Applications
- Advanced Vision and Imaging
- Image Enhancement Techniques
- Video Analysis and Summarization
- Medical Imaging Techniques and Applications
- COVID-19 diagnosis using AI
- Gait Recognition and Analysis
- Advanced Image Fusion Techniques
- Face recognition and analysis
- Face and Expression Recognition
- Generative Adversarial Networks and Image Synthesis
- Image and Video Quality Assessment
- Image Processing Techniques and Applications
- Medical Image Segmentation Techniques
- Robotics and Sensor-Based Localization
Inception Institute of Artificial Intelligence
2017-2025
University of Chinese Academy of Sciences
2019-2025
China University of Geosciences (Beijing)
2023-2025
Hubei University of Chinese Medicine
2024
RefleXion Medical (United States)
2021-2024
Tianjin University
2019-2023
University College of Applied Science
2023
Smile Train
2023
Institute of Economics
2023
Hefei University of Technology
2023
Although convolutional neural networks (CNNs) have achieved great success in computer vision, this work investigates a simpler, convolution-free backbone network use-fid for many dense prediction tasks. Unlike the recently-proposed Vision Transformer (ViT) that was designed image classification specifically, we introduce Pyramid (PVT), which overcomes difficulties of porting to various PVT has several merits compared current state arts. (1) Different from ViT typically yields low-resolution...
Single image haze removal has been a challenging problem due to its ill-posed nature. In this paper, we propose simple but powerful color attenuation prior for from single input hazy image. By creating linear model modeling the scene depth of under novel and learning parameters with supervised method, information can be well recovered. With map image, easily estimate transmission restore radiance via atmospheric scattering model, thus effectively remove Experimental results show that...
Image restoration tasks demand a complex balance between spatial details and high-level contextualized information while recovering images. In this paper, we propose novel synergistic design that can optimally these competing goals. Our main proposal is multi-stage architecture, progressively learns functions for the degraded inputs, thereby breaking down overall recovery process into more manageable steps. Specifically, our model first features using encoder-decoder architectures later...
Transformer recently has presented encouraging progress in computer vision. In this work, we present new baselines by improving the original Pyramid Vision (PVT v1) adding three designs, including (1) linear complexity attention layer, (2) overlapping patch embedding, and (3) convolutional feed-forward network. With these modifications, PVT v2 reduces computational of v1 to achieves significant improvements on fundamental vision tasks such as classification, detection, segmentation. Notably,...
Coronavirus Disease 2019 (COVID-19) spread globally in early 2020, causing the world to face an existential health crisis. Automated detection of lung infections from computed tomography (CT) images offers a great potential augment traditional healthcare strategy for tackling COVID-19. However, segmenting infected regions CT slices faces several challenges, including high variation infection characteristics, and low intensity contrast between normal tissues. Further, collecting large amount...
Regular machine learning and data mining techniques study the training for future inferences under a major assumption that are within same feature space or have distribution as data. However, due to limited availability of human labeled data, stay in cannot be guaranteed sufficient enough avoid over-fitting problem. In real-world applications, apart from target domain, related different domain can also included expand our prior knowledge about Transfer addresses such cross-domain problems by...
Neural network pruning offers a promising prospect to facilitate deploying deep neural networks on resource-limited devices. However, existing methods are still challenged by the training inefficiency and labor cost in designs, due missing theoretical guidance of non-salient components. In this paper, we propose novel filter method exploring High Rank feature maps (HRank). Our HRank is inspired discovery that average rank multiple generated single always same, regardless number image batches...
This paper proposes a deep learning model to efficiently detect salient regions in videos. It addresses two important issues: 1) video saliency training with the absence of sufficiently large and pixel-wise annotated data 2) fast detection. The proposed network consists modules, for capturing spatial temporal information, respectively. dynamic model, explicitly incorporating estimates from static directly produces spatiotemporal inference without time-consuming optical flow computation. We...
We present a comprehensive study on new task named camouflaged object detection (COD), which aims to identify objects that are "seamlessly" embedded in their surroundings. The high intrinsic similarities between the target and background make COD far more challenging than traditional task. To address this issue, we elaborately collect novel dataset, called COD10K, comprises 10,000 images covering various natural scenes, over 78 categories. All densely annotated with category, bounding-box,...
We introduce a novel network, called as CO-attention Siamese Network (COSNet), to address the unsupervised video object segmentation task from holistic view. emphasize importance of inherent correlation among frames and incorporate global co-attention mechanism improve further state-of-the-art deep learning based solutions that primarily focus on discriminative foreground representations over appearance motion in short-term temporal segments. The layers our network provide efficient...
Clustering is a long-standing important research problem, however, remains challenging when handling large-scale image data from diverse sources. In this paper, we present novel Binary Multi-View (BMVC) framework, which can dexterously manipulate multi-view and easily scale to large data. To achieve goal, formulate BMVC by two key components: compact collaborative discrete representation learning binary clustering structure learning, in joint framework. Specifically, collaboratively encodes...
This paper describes a novel method called Deep Dynamic Neural Networks (DDNN) for multimodal gesture recognition. A semi-supervised hierarchical dynamic framework based on Hidden Markov Model (HMM) is proposed simultaneous segmentation and recognition where skeleton joint information, depth RGB images, are the input observations. Unlike most traditional approaches that rely construction of complex handcrafted features, our approach learns high-level spatio-temporal representations using...
We present the first systematic study on concealed object detection (COD), which aims to identify objects that are visually embedded in their background. The high intrinsic similarities between and background make COD far more challenging than traditional detection/segmentation. To better understand this task, we collect a large-scale dataset, called COD10K, consists of 10,000 images covering diverse real-world scenarios from 78 categories. Further, provide rich annotations including...
The Ingenuity TF PET–MRI is a newly released whole-body hybrid PET–MR imaging system with Philips time-of-flight GEMINI PET and Achieva 3T X-series MRI system. Compared to PET–CT, modifications the positron emission tomography (PET) gantry were made avoid mutual interference deliver uncompromising performance which equivalent standalone systems. was redesigned introduce magnetic shielding for photomultiplier tubes (PMTs). Stringent electromagnetic noise requirements of MR necessitated...
We present a novel spatiotemporal saliency detection method to estimate salient regions in videos based on the gradient flow field and energy optimization. The proposed incorporates two distinctive features: 1) intra-frame boundary information 2) inter-frame motion together for indicating regions. Based effective utilization of both field, our algorithm is robust enough object background complex scenes with various patterns appearances. Then, we introduce local as well global contrast...
Abnormal event detection in video is a challenging vision problem. Most existing approaches formulate abnormal as an outlier task, due to the scarcity of anomalous data during training. Because lack prior information regarding events, these methods are not fully-equipped differentiate between normal and events. In this work, we formalize one-versus-rest binary classification Our contribution two-fold. First, introduce unsupervised feature learning framework based on object-centric...
Crowd counting has recently attracted increasing interest in computer vision but remains a challenging problem. In this paper, we propose trellis encoder-decoder network (TEDnet) for crowd counting, which focuses on generating high-quality density estimation maps. The major contributions are four-fold. First, develop new architecture that incorporates multiple decoding paths to hierarchically aggregate features at different encoding stages, improves the representative capability of...