NFDI4DS | UHH-SEMS - Publication Details

TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios

OPENALEX - Publications

Xingkui Zhu Shuchang Lyu Xu Wang Qi Zhao

Object detection on drone-captured scenarios is a recent popular task. As drones always navigate in different altitudes, the object scale varies violently, which burdens optimization of networks. Moreover, high-speed and low-altitude flight bring motion blur densely packed objects, leads to great challenge distinction. To solve two issues mentioned above, we propose TPH-YOLOv5. Based YOLOv5, add one more prediction head detect different-scale objects. Then replace original heads with...

10.1109/iccvw54120.2021.00312 article EN 2021-10-01

VisDrone-DET2021: The Vision Meets Drone Object detection Challenge Results

OPENALEX - Publications

Yaru Cao Zhijian He Lujia Wang Wenguan Wang Yixuan Yuan and 32 more

Object detection on the drone faces a great diversity of challenges such as small object inference, background clutter and wide viewpoint. In contrast to traditional problem in computer vision, bird-like angle can not be transplanted directly from common-in-use methods due special texture sky's view. However, lack comprehensive data set, number algorithms that focus using captured by drones is limited. So VisDrone team gathered massive set organized Vision Meets Drones: A Challenge...

10.1109/iccvw54120.2021.00319 article EN 2021-10-01

Turning a CLIP Model into a Scene Text Spotter

OPENALEX - Publications

Wenwen Yu Yuliang Liu Xingkui Zhu Haoyu Cao Xing Sun and 1 more

We exploit the potential of large-scale Contrastive Language-Image Pretraining (CLIP) model to enhance scene text detection and spotting tasks, transforming it into a robust backbone, FastTCM-CR50. This backbone utilizes visual prompt learning cross-attention in CLIP extract image text-based prior knowledge. Using predefined learnable prompts, FastTCM-CR50 introduces an instance-language matching process synergy between embeddings, thereby refining regions. Our Bimodal Similarity Matching...

10.1109/tpami.2024.3379828 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-03-20

Layerlink: Bridging remote sensing object detection and large vision models with efficient fine-tuning

OPENALEX - Publications

Xingkui Zhu Dingkang Liang Xingyu Jiang Yiran Guan Yuliang Liu and 2 more

10.1016/j.patcog.2025.111583 article EN Pattern Recognition 2025-03-01

TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios

OPENALEX - Publications

Xingkui Zhu Shuchang Lyu Xu Wang Qi Zhao

Object detection on drone-captured scenarios is a recent popular task. As drones always navigate in different altitudes, the object scale varies violently, which burdens optimization of networks. Moreover, high-speed and low-altitude flight bring motion blur densely packed objects, leads to great challenge distinction. To solve two issues mentioned above, we propose TPH-YOLOv5. Based YOLOv5, add one more prediction head detect different-scale objects. Then replace original heads with...

10.48550/arxiv.2108.11539 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Absolute phase retrieval via color phase-coding

OPENALEX - Publications

Teng Cheng Qingyu Du Yaxi Jiang Xingkui Zhu

10.1016/j.ijleo.2017.05.017 article EN Optik 2017-05-04

Turning a CLIP Model into a Scene Text Spotter

OPENALEX - Publications

Wenwen Yu Yuliang Liu Xingkui Zhu Haoyu Cao Xing Sun and 1 more

We exploit the potential of large-scale Contrastive Language-Image Pretraining (CLIP) model to enhance scene text detection and spotting tasks, transforming it into a robust backbone, FastTCM-CR50. This backbone utilizes visual prompt learning cross-attention in CLIP extract image text-based prior knowledge. Using predefined learnable prompts, FastTCM-CR50 introduces an instance-language matching process synergy between embeddings, thereby refining regions. Our Bimodal Similarity Matching...

10.48550/arxiv.2308.10408 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01