Yaozong Gan

ORCID: 0009-0001-8813-3400
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Human Pose and Action Recognition
  • Video Analysis and Summarization
  • Vehicle License Plate Recognition
  • Advanced Neural Network Applications
  • Handwritten Text Recognition Techniques
  • Infrastructure Maintenance and Monitoring
  • Advanced Image and Video Retrieval Techniques
  • Hand Gesture Recognition Systems
  • Image Processing and 3D Reconstruction
  • Advanced Vision and Imaging
  • Music and Audio Processing

Hokkaido University
2021-2024

Traffic sign recognition is a complex and challenging yet popular problem that can assist drivers on the road reduce traffic accidents. Most existing methods for use convolutional neural networks (CNNs) achieve high accuracy. However, these first require large number of carefully crafted datasets training process. Moreover, since signs differ in each country there variety signs, need to be fine-tuned when recognizing new categories. To address issues, we propose matching method zero-shot...

10.3390/s23239607 article EN cc-by Sensors 2023-12-04

Recent multimodal large language models (MLLM) such as GPT-4o and GPT-4v have shown great potential in autonomous driving. In this paper, we propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition (TSR). We first construct detection network Vision Transformer Adapter an extraction module to extract signs from original road images. To reduce dependence training data improve performance stability of cross-country TSR, introduce MLLM....

10.48550/arxiv.2407.05814 preprint EN arXiv (Cornell University) 2024-07-08

10.1109/icip51287.2024.10647129 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2024-09-27

We propose a new strategy called think twice before recognizing to improve fine-grained traffic sign recognition (TSR). Fine-grained TSR in the wild is difficult due complex road conditions, and existing approaches particularly struggle with cross-country when data lacking. Our achieves effective by stimulating multiple-thinking capability of large multimodal models (LMM). introduce context, characteristic, differential descriptions design multiple thinking processes for LMM. The context...

10.48550/arxiv.2409.01534 preprint EN arXiv (Cornell University) 2024-09-02

This paper presents a transformer-based multimodal soccer scene recognition method for both visual and audio modalities. Our approach directly uses the original video frames spectrogram from as input of transformer model, which can capture spatial information action at moment contextual temporal between different actions in videos. We fuse output model order to better identify scenes that occur real matches. The late fusion performs weighted average estimation results obtain complete scene....

10.1109/icmew56448.2022.9859304 article EN 2022-07-18

This paper presents a scene retrieval method in soccer videos with video vision Transformer (ViViT). In coaching, it is difficult for the training staff to find required scenes efficiently from large number of videos. We tackle this problem simple yet effective method. train ViViT and obtain output token features by pre-trained model. The tokens contain spatio-temporal information scenes. then transform query candidate into using calculate similarity between cosine similarity. conducted...

10.1109/icce-taiwan55306.2022.9869188 article EN 2022 IEEE International Conference on Consumer Electronics - Taiwan 2022-07-06

Similar scene retrieval in soccer videos has been drawing a lot of attention recent years. In previous studies, long and unified frame sequences extracted from are used to represent scene. However, it causes confusion that affects the performance. this paper, we propose reduction method based on combination short for similar videos. Our preserves both complete contextual information immediate state action represented by sequences. The experimental results show MAP@10 achieves 0.587 with our approach.

10.1109/gcce53005.2021.9621825 article EN 2022 IEEE 11th Global Conference on Consumer Electronics (GCCE) 2021-10-12
Coming Soon ...