NFDI4DS | UHH-SEMS - Publication Details

Yaozong Gan

ORCID: 0009-0001-8813-3400

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5038719961

Research Areas

Human Pose and Action Recognition
Video Analysis and Summarization
Vehicle License Plate Recognition
Advanced Neural Network Applications
Handwritten Text Recognition Techniques
Infrastructure Maintenance and Monitoring
Advanced Image and Video Retrieval Techniques
Hand Gesture Recognition Systems
Image Processing and 3D Reconstruction
Advanced Vision and Imaging
Music and Audio Processing

Hokkaido University
2021-2024

Cross-Domain Multi-Step Thinking: Zero-Shot Fine-Grained Traffic Sign Recognition in the Wild

OPENALEX - Publications

Yaozong Gan Guang Li Ren Togo Keisuke Maeda Takahiro Ogawa and 1 more

10.2139/ssrn.5202367 preprint EN 2025-01-01

Zero-Shot Traffic Sign Recognition Based on Midlevel Feature Matching

OPENALEX - Publications

Yaozong Gan Guang Li Ren Togo Keisuke Maeda Takahiro Ogawa and 1 more

Traffic sign recognition is a complex and challenging yet popular problem that can assist drivers on the road reduce traffic accidents. Most existing methods for use convolutional neural networks (CNNs) achieve high accuracy. However, these first require large number of carefully crafted datasets training process. Moreover, since signs differ in each country there variety signs, need to be fine-tuned when recognizing new categories. To address issues, we propose matching method zero-shot...

10.3390/s23239607 article EN cc-by Sensors 2023-12-04

Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition

OPENALEX - Publications

Yaozong Gan Guang Li Ren Togo Keisuke Maeda Takahiro Ogawa and 1 more

Recent multimodal large language models (MLLM) such as GPT-4o and GPT-4v have shown great potential in autonomous driving. In this paper, we propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition (TSR). We first construct detection network Vision Transformer Adapter an extraction module to extract signs from original road images. To reduce dependence training data improve performance stability of cross-country TSR, introduce MLLM....

10.48550/arxiv.2407.05814 preprint EN arXiv (Cornell University) 2024-07-08

Cross-Domain Few-Shot In-Context Learning For Enhancing Traffic Sign Recognition

OPENALEX - Publications

Yaozong Gan Guang Li Ren Togo Keisuke Maeda Takahiro Ogawa and 1 more

10.1109/icip51287.2024.10647129 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2024-09-27

Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition

OPENALEX - Publications

Yaozong Gan Guang Li Ren Togo Keisuke Maeda Takahiro Ogawa and 1 more

We propose a new strategy called think twice before recognizing to improve fine-grained traffic sign recognition (TSR). Fine-grained TSR in the wild is difficult due complex road conditions, and existing approaches particularly struggle with cross-country when data lacking. Our achieves effective by stimulating multiple-thinking capability of large multimodal models (LMM). introduce context, characteristic, differential descriptions design multiple thinking processes for LMM. The context...

10.48550/arxiv.2409.01534 preprint EN arXiv (Cornell University) 2024-09-02

Transformer Based Multimodal Scene Recognition in Soccer Videos

OPENALEX - Publications

Yaozong Gan Ren Togo Takahiro Ogawa Miki Haseyama

This paper presents a transformer-based multimodal soccer scene recognition method for both visual and audio modalities. Our approach directly uses the original video frames spectrogram from as input of transformer model, which can capture spatial information action at moment contextual temporal between different actions in videos. We fuse output model order to better identify scenes that occur real matches. The late fusion performs weighted average estimation results obtain complete scene....

10.1109/icmew56448.2022.9859304 article EN 2022-07-18

Scene Retrieval in Soccer Videos by Spatial-temporal Attention with Video Vision Transformer

OPENALEX - Publications

Yaozong Gan Ren Togo Takahiro Ogawa Miki Haseyama

This paper presents a scene retrieval method in soccer videos with video vision Transformer (ViViT). In coaching, it is difficult for the training staff to find required scenes efficiently from large number of videos. We tackle this problem simple yet effective method. train ViViT and obtain output token features by pre-trained model. The tokens contain spatio-temporal information scenes. then transform query candidate into using calculate similarity between cosine similarity. conducted...

10.1109/icce-taiwan55306.2022.9869188 article EN 2022 IEEE International Conference on Consumer Electronics - Taiwan 2022-07-06

Multi-class Similar Scene Retrieval in Soccer Videos: A Scene Confusion Reduction Method Based on Combination of Long and Short Frame Sequences

OPENALEX - Publications

Yaozong Gan Ren Togo Takahiro Ogawa Miki Haseyama

Similar scene retrieval in soccer videos has been drawing a lot of attention recent years. In previous studies, long and unified frame sequences extracted from are used to represent scene. However, it causes confusion that affects the performance. this paper, we propose reduction method based on combination short for similar videos. Our preserves both complete contextual information immediate state action represented by sequences. The experimental results show MAP@10 achieves 0.587 with our approach.

10.1109/gcce53005.2021.9621825 article EN 2022 IEEE 11th Global Conference on Consumer Electronics (GCCE) 2021-10-12

Coming Soon ...