- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Video Surveillance and Tracking Methods
- Natural Language Processing Techniques
- Advanced Image and Video Retrieval Techniques
- Optical measurement and interference techniques
- Robotics and Sensor-Based Localization
- Advanced X-ray Imaging Techniques
- Multimodal Machine Learning Applications
- Image Processing Techniques and Applications
- Handwritten Text Recognition Techniques
- Web Data Mining and Analysis
Huazhong University of Science and Technology
2024
Beihang University
2017-2021
Object detection on drone-captured scenarios is a recent popular task. As drones always navigate in different altitudes, the object scale varies violently, which burdens optimization of networks. Moreover, high-speed and low-altitude flight bring motion blur densely packed objects, leads to great challenge distinction. To solve two issues mentioned above, we propose TPH-YOLOv5. Based YOLOv5, add one more prediction head detect different-scale objects. Then replace original heads with...
Object detection on the drone faces a great diversity of challenges such as small object inference, background clutter and wide viewpoint. In contrast to traditional problem in computer vision, bird-like angle can not be transplanted directly from common-in-use methods due special texture sky's view. However, lack comprehensive data set, number algorithms that focus using captured by drones is limited. So VisDrone team gathered massive set organized Vision Meets Drones: A Challenge...
We exploit the potential of large-scale Contrastive Language-Image Pretraining (CLIP) model to enhance scene text detection and spotting tasks, transforming it into a robust backbone, FastTCM-CR50. This backbone utilizes visual prompt learning cross-attention in CLIP extract image text-based prior knowledge. Using predefined learnable prompts, FastTCM-CR50 introduces an instance-language matching process synergy between embeddings, thereby refining regions. Our Bimodal Similarity Matching...
Object detection on drone-captured scenarios is a recent popular task. As drones always navigate in different altitudes, the object scale varies violently, which burdens optimization of networks. Moreover, high-speed and low-altitude flight bring motion blur densely packed objects, leads to great challenge distinction. To solve two issues mentioned above, we propose TPH-YOLOv5. Based YOLOv5, add one more prediction head detect different-scale objects. Then replace original heads with...
We exploit the potential of large-scale Contrastive Language-Image Pretraining (CLIP) model to enhance scene text detection and spotting tasks, transforming it into a robust backbone, FastTCM-CR50. This backbone utilizes visual prompt learning cross-attention in CLIP extract image text-based prior knowledge. Using predefined learnable prompts, FastTCM-CR50 introduces an instance-language matching process synergy between embeddings, thereby refining regions. Our Bimodal Similarity Matching...