- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Advanced Image and Video Retrieval Techniques
- Multimodal Machine Learning Applications
- Remote-Sensing Image Classification
- Video Surveillance and Tracking Methods
- Human Pose and Action Recognition
- Remote Sensing and Land Use
- Image Processing and 3D Reconstruction
- Image Retrieval and Classification Techniques
- Handwritten Text Recognition Techniques
- AI in cancer detection
- Image Enhancement Techniques
- Advanced Image Fusion Techniques
- Radiomics and Machine Learning in Medical Imaging
- Neural Networks and Applications
- Brain Tumor Detection and Classification
- Natural Language Processing Techniques
- Infrared Target Detection Methodologies
- Vehicle License Plate Recognition
- Image and Object Detection Techniques
- Remote Sensing in Agriculture
- Adversarial Robustness in Machine Learning
- Robotics and Automated Systems
- Machine Learning and ELM
Beihang University
2018-2024
Group Sense (China)
2019-2020
Object detection on drone-captured scenarios is a recent popular task. As drones always navigate in different altitudes, the object scale varies violently, which burdens optimization of networks. Moreover, high-speed and low-altitude flight bring motion blur densely packed objects, leads to great challenge distinction. To solve two issues mentioned above, we propose TPH-YOLOv5. Based YOLOv5, add one more prediction head detect different-scale objects. Then replace original heads with...
Object detection on the drone faces a great diversity of challenges such as small object inference, background clutter and wide viewpoint. In contrast to traditional problem in computer vision, bird-like angle can not be transplanted directly from common-in-use methods due special texture sky's view. However, lack comprehensive data set, number algorithms that focus using captured by drones is limited. So VisDrone team gathered massive set organized Vision Meets Drones: A Challenge...
Object detection in drone-captured images is a popular task recent years. As drones always navigate at different altitudes, the object scale varies considerably, which burdens optimization of models. Moreover, high-speed and low-altitude flight cause motion blur on densely packed objects, leads to great challenges. To solve two issues mentioned above, based YOLOv5, we add an additional prediction head detect tiny-scale objects replace CNN-based heads with transformer (TPH), constructing...
Remote sensing (RS) image semantic segmentation using deep convolutional neural networks (DCNNs) has shown great success in various applications. However, the high dependence on annotated data makes it challenging for DCNNs to adapt different RS scenes. To address this challenge, we propose a cross-domain task that considers ground sampling distance, remote sensor variation, and geographical landscapes as main factors causing domain shifts between source target images. mitigate negative...
Remote sensing (RS) scene classification is a challenging task to predict categories of RS images. images have two main issues: large intraclass variance caused by resolution and confusing information from geographic covering area. To ease the negative influence above issues. We propose multigranularity multilevel feature ensemble network (MGML-FENet) efficiently tackle in this article. Specifically, we fusion branch (MGML-FFB) extract features different levels channel-separate generator...
Few-shot segmentation focuses on the generalization of models to segment unseen object with limited annotated samples. However, existing approaches still face two main challenges. First, huge feature distinction between support and query images causes knowledge transferring barrier, which harms performance. Second, prototypes cannot adequately represent features objects, hard guide high-quality segmentation. To deal above issues, we propose self-distillation embedded supervised affinity...
Open-vocabulary learning has emerged as a cutting-edge research area, particularly in light of the widespread adoption vision-based foundational models. Its primary objective is to comprehend novel concepts that are not encompassed within predefined vocabulary. One key facet this endeavor Visual Grounding (VG), which entails locating specific region an image based on corresponding language description. While current models excel at various visual tasks, there noticeable absence specifically...
Deep convolutional neural networks have achieved great success on image classification. A series of feature extractors learned from CNN been used in many computer vision tasks. Global pooling layer plays a very important role deep networks. It is found that the input feature-maps global become sparse, as increasing use Batch Normalization and ReLU combination, which makes original low efficiency. In this paper, we proposed novel end-to-end trainable operator AlphaMEX Pool for network....
Remote sensing (RS) image scene classification task faces many challenges due to the interference from different characteristics of geographical elements. To solve this problem, we propose a multi-branch ensemble network enhance feature representation ability by fusing features in final output logits and intermediate maps. However, simply adding branches will increase complexity models decline inference efficiency. On issue, embed self-distillation (SD) method transfer knowledge main-branch...
Recent CNNs (convolutional neural networks) have become more and compact. The elegant structure design highly improves the performance of CNNs. With development knowledge distillation technique, gets further improved. However, existing guided methods either rely on offline pretrained high-quality large teacher models or online heavy training burden. To solve above problems, we propose a feature-sharing weight-sharing based ensemble network (training framework) by (EKD-FWSNet) to make...
Convolutional neural networks (CNNs) are becoming more and popular today. CNNs now have become a feature extractor applying to image processing, big data fog computing, etc. usually consist of several basic units like convolutional unit, pooling activation so on. In CNNs, conventional methods refer 2×2 max‐pooling average‐pooling, which applied after the or ReLU layers. this paper, we propose Multiactivation Pooling (MAP) Method make accurate on classification tasks without increasing depth...
Visual navigation, characterized by its autonomous capabilities, cost effectiveness, and robust resistance to interference, serves as the foundation for vision-based landing systems. These systems rely heavily on runway instance segmentation, which accurately divides areas provides precise information unmanned aerial vehicle (UAV) navigation. However, current research primarily focuses detection but lacks relevant segmentation datasets. To address this gap, we created Runway Landing Dataset...
Object detection on drone-captured scenarios is a recent popular task. As drones always navigate in different altitudes, the object scale varies violently, which burdens optimization of networks. Moreover, high-speed and low-altitude flight bring motion blur densely packed objects, leads to great challenge distinction. To solve two issues mentioned above, we propose TPH-YOLOv5. Based YOLOv5, add one more prediction head detect different-scale objects. Then replace original heads with...
The emergence of new wearable technologies, such as action cameras and smart glasses, has driven the use first-person perspective in computer applications. This field is now attracting attention investment researchers aiming to develop methods process vision (FPV) video. current approaches present particular combinations different image features quantitative accomplish specific objectives, object detection, activity recognition, user–machine interaction, etc. FPV-based navigation necessary...
Convolutional neural networks (CNN) are mainly used for image recognition tasks. However, some huge models infeasible mobile devices because of limited computing and memory resources. In this paper, feature maps DenseNet CondenseNet visualized. It could be observed that there channels in locked state have similar distribution property, which compressed further. Thus, work, a novel architecture — RSNet is introduced to improve the efficiency CNNs. This paper proposes Relative-Squeezing (RS)...
Pine wilt disease (PWD) is a worldwide affliction that poses significant menace to forest ecosystems. The swift and precise identification of pine trees under infection holds paramount significance in the proficient administration this ailment. progression remote sensing deep learning methodologies has propelled utilization target detection recognition techniques reliant on imagery, emerging as prevailing strategy for pinpointing affected trees. Although existing object algorithms have...
Visual Grounding (VG) aims at localizing target objects from an image based on given expressions and has made significant progress with the development of detection vision transformer. However, existing VG methods tend to generate false-alarm when presented inaccurate or irrelevant descriptions, which commonly occur in practical applications. Moreover, fail capture fine-grained features, accurate localization, sufficient context comprehension whole textual descriptions. To address both...
Ovarian cancer is one of the most harmful gynecological diseases. Detecting ovarian tumors in early stage with computer-aided techniques can efficiently decrease mortality rate. With improvement medical treatment standard, ultrasound images are widely applied clinical treatment. However, recent notable methods mainly focus on single-modality tumor segmentation or recognition, which means there still lacks researches exploring representation capability multi-modality images. To solve this...
In the object detection task, deep learning-based methods always need a large amount of annotated training data. However, annotating number images is labor-intensive. order to reduce dependency expensive annotations, we propose novel end-to-end feature reconstruction and metric based network for few-shot (FM-FSOD). FM-FSOD integrates learning meta-learning tackle task. class-agnostic model that can accurately recognize categories without fine-tuning on categories. Specifically, quickly learn...