Xingkui Zhu

ORCID: 0009-0008-0561-4390
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Domain Adaptation and Few-Shot Learning
  • Video Surveillance and Tracking Methods
  • Natural Language Processing Techniques
  • Advanced Image and Video Retrieval Techniques
  • Optical measurement and interference techniques
  • Robotics and Sensor-Based Localization
  • Advanced X-ray Imaging Techniques
  • Multimodal Machine Learning Applications
  • Image Processing Techniques and Applications
  • Handwritten Text Recognition Techniques
  • Web Data Mining and Analysis

Huazhong University of Science and Technology
2024

Beihang University
2017-2021

Object detection on drone-captured scenarios is a recent popular task. As drones always navigate in different altitudes, the object scale varies violently, which burdens optimization of networks. Moreover, high-speed and low-altitude flight bring motion blur densely packed objects, leads to great challenge distinction. To solve two issues mentioned above, we propose TPH-YOLOv5. Based YOLOv5, add one more prediction head detect different-scale objects. Then replace original heads with...

10.1109/iccvw54120.2021.00312 article EN 2021-10-01

Object detection on the drone faces a great diversity of challenges such as small object inference, background clutter and wide viewpoint. In contrast to traditional problem in computer vision, bird-like angle can not be transplanted directly from common-in-use methods due special texture sky's view. However, lack comprehensive data set, number algorithms that focus using captured by drones is limited. So VisDrone team gathered massive set organized Vision Meets Drones: A Challenge...

10.1109/iccvw54120.2021.00319 article EN 2021-10-01

We exploit the potential of large-scale Contrastive Language-Image Pretraining (CLIP) model to enhance scene text detection and spotting tasks, transforming it into a robust backbone, FastTCM-CR50. This backbone utilizes visual prompt learning cross-attention in CLIP extract image text-based prior knowledge. Using predefined learnable prompts, FastTCM-CR50 introduces an instance-language matching process synergy between embeddings, thereby refining regions. Our Bimodal Similarity Matching...

10.1109/tpami.2024.3379828 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-03-20

Object detection on drone-captured scenarios is a recent popular task. As drones always navigate in different altitudes, the object scale varies violently, which burdens optimization of networks. Moreover, high-speed and low-altitude flight bring motion blur densely packed objects, leads to great challenge distinction. To solve two issues mentioned above, we propose TPH-YOLOv5. Based YOLOv5, add one more prediction head detect different-scale objects. Then replace original heads with...

10.48550/arxiv.2108.11539 preprint EN other-oa arXiv (Cornell University) 2021-01-01

We exploit the potential of large-scale Contrastive Language-Image Pretraining (CLIP) model to enhance scene text detection and spotting tasks, transforming it into a robust backbone, FastTCM-CR50. This backbone utilizes visual prompt learning cross-attention in CLIP extract image text-based prior knowledge. Using predefined learnable prompts, FastTCM-CR50 introduces an instance-language matching process synergy between embeddings, thereby refining regions. Our Bimodal Similarity Matching...

10.48550/arxiv.2308.10408 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01
Coming Soon ...