- Advanced Neural Network Applications
- Visual Attention and Saliency Detection
- Advanced Image and Video Retrieval Techniques
- Video Surveillance and Tracking Methods
- Robotics and Sensor-Based Localization
- Music Technology and Sound Studies
- Music and Audio Processing
- Optical measurement and interference techniques
- Smart Agriculture and AI
- Remote-Sensing Image Classification
- Domain Adaptation and Few-Shot Learning
- Image and Object Detection Techniques
- Retinal Imaging and Analysis
- Image Processing and 3D Reconstruction
- Medical Image Segmentation Techniques
- Image and Video Quality Assessment
- Anomaly Detection Techniques and Applications
- Cardiovascular Health and Disease Prevention
- Electrical and Bioimpedance Tomography
- Speech and Audio Processing
- Remote Sensing and Land Use
- Magnetic Field Sensors Techniques
- Analog and Mixed-Signal Circuit Design
- Color Science and Applications
- IoT Networks and Protocols
Beihang University
2022-2024
State Key Laboratory of Virtual Reality Technology and Systems
2024
Shandong University
2024
Hangzhou Dianzi University
2024
Depth images and thermal contain the spatial geometry information surface temperature information, which can act as complementary for RGB modality.However, quality of depth is often unreliable in some challenging scenarios, will result performance degradation two-modal based salient object detection (SOD).Meanwhile, researchers pay attention to triple-modal SOD task, namely visibledepth-thermal (VDT) SOD, where they attempt explore complementarity image, image.However, existing methods fail...
Detecting objects from aerial images poses significant challenges due to the following factors: 1) Aerial typically have very large sizes, generally with millions or even hundreds of pixels, while computational resources are limited. 2) Small object size leads insufficient information for effective detection. 3) Non-uniform distribution resource wastage. To address these issues, we propose YOLC (You Only Look Clusters), an efficient and framework that builds on anchor-free detector,...
Object counting, which aims to count the accurate number of object instances in images, has been attracting more and attention. However, challenges such as large-scale variation, complex background interference, nonuniform density distribution greatly limit counting accuracy, particularly striking remote-sensing imagery. To mitigate above issues, this article proposes a novel framework for dense incorporates pyramidal scale module (PSM) global context (GCM), dubbed PSGCNet, where PSM is used...
Human pose estimation and tracking are fundamental tasks for understanding human behaviors in videos. Existing top-down framework-based methods usually perform three-stage tasks: detection, tracking. Although promising results have been achieved, these rely heavily on high-performance detectors may fail to track persons who occluded or miss-detected. To overcome problems, this paper, we develop a novel keypoint confidence network pipeline improve detection approaches. Specifically, the is...
A coarse-to-fine multi-view stereo network with Transformer (MVS-T) is proposed to solve the problems of sparse point clouds and low accuracy in reconstructing 3D scenes from low-resolution images. The uses a strategy estimate depth image progressively reconstruct cloud. First, pyramids features are constructed transfer semantic spatial information among at different scales. Then, module employed aggregate image's global context capture internal correlation feature map. Finally, inferred by...
Detecting objects from aerial images poses significant challenges due to the following factors: 1) Aerial typically have very large sizes, generally with millions or even hundreds of pixels, while computational resources are limited. 2) Small object size leads insufficient information for effective detection. 3) Non-uniform distribution resource wastage. To address these issues, we propose YOLC (You Only Look Clusters), an efficient and framework that builds on anchor-free detector,...
Topological building extraction in remote sensing images is vital for city planning, disaster assessment, and other real-world applications. To meet the requirements of applications, existing approaches predict topological by vectorization binary masks using multiple refinement stages, leading to complex methodology poor generalization. tackle this issue, we propose a approach directly predicting serialized vertices each instance. We observe that order from one inherently bidirectional,...
Although multi-view 3D object detection based on the Bird's-Eye-View (BEV) paradigm has garnered widespread attention as an economical and deployment-friendly perception solution for autonomous driving, there is still a performance gap compared to LiDAR-based methods. In recent years, several cross-modal distillation methods have been proposed transfer beneficial information from teacher models student models, with aim of enhancing performance. However, these face challenges due...
Unsupervised domain adaptation (UDA) is attracting more attention from researchers for boosting the task-specific generalization on target domain. It focuses addressing shift between labeled source and unlabeled Recent biclassifier-based UDA models perform category-level alignment to reduce shift, meanwhile, self-training used improving discriminability of instances. However, error accumulation problem instances with high semantic uncertainty may cause degradation misalignment. To solve this...
Existing symbolic music generation methods usually utilize discriminator to improve the quality of generated via global perception music. However, considering complexity information in music, such as rhythm and melody, a single cannot fully reflect differences these two primary dimensions In this work, we propose decouple melody from design corresponding fine-grained discriminators tackle aforementioned issues. Specifically, equipped with pitch augmentation strategy, discerns variations...
Given a set of 2D scattering points from an edge detection operator, the aim ellipse fitting is to construct elliptic equation that best fit observations. For data collected often contain noisy, uncertainty, and incompleteness which constitutes considerable challenge for all algorithms. To address this issue, method direct by minimizing L0 algebraic distance presented. Unlike its L2 counterparts assumed error follows Gaussian distribution, our tried model outliers using norm between ideal...
Seed sorting is critical for the breeding industry to improve agricultural yield. The seed methods based on convolutional neural networks (CNNs) have achieved excellent recognition accuracy large-scale pretrained network models. However, CNN inference a computationally intensive process that often requires hardware acceleration operate in real time. For embedded devices, high-power consumption of graphics processing units (GPUs) generally prohibitive, and field programmable gate array (FPGA)...
Semantic segmentation is the process of categorizing all pixels in an image. Given inherent challenges attaining fine labels, researchers have recently embraced weak labels to mitigate annotation burden segmentation. The current work on weakly supervised semantic (WSSS) mainly focuses expanding pseudo-label seeds salient regions image, but there are also many objects outside area that not been discovered. In this work, we propose innovative WSSS method by exploring non-significant areas...
The biological model of the mammal visual mechanisms is very beneficial to feature learning in motionless images. It proved that can improve performance hand-crafted methods and CNNS method. Recently CNNs learn discriminate robust features by changing backbone, processing multi-scale maps, adding attention mechanisms, etc. While they are relatively short network structure with human retina which have been proven a strong extract capability images traditional descriptors. To address this...
To achieve long-term economic growth, competitiveness, and sustainability, speed accuracy are the key requirements when it comes to seed purity sorting. However, current sorting methods suffer from large number of model parameters computational complexity, make a great challenge deploy them in real-time applications, especially on devices with limited resources. issue above problems, this paper, lightweight efficient network pyramid dilated convolution, namely LEPD-Net, is proposed for...