NFDI4DS | UHH-SEMS - Publication Details

Liqi Yan

ORCID: 0000-0002-7077-4947

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5051997091

Research Areas

Multimodal Machine Learning Applications
Robotics and Sensor-Based Localization
Advanced Image and Video Retrieval Techniques
Human Pose and Action Recognition
Vehicle Noise and Vibration Control
Domain Adaptation and Few-Shot Learning
Advanced Neural Network Applications
Machine Fault Diagnosis Techniques
Advanced Vision and Imaging
3D Surveying and Cultural Heritage
Video Analysis and Summarization
Video Surveillance and Tracking Methods
Advanced Measurement and Detection Methods
Advanced Algorithms and Applications
Acoustic Wave Phenomena Research
Cancer-related molecular mechanisms research
Visual Attention and Saliency Detection
Aerodynamics and Acoustics in Jet Flows
Turbomachinery Performance and Optimization
Structural Health Monitoring Techniques
IoT-based Smart Home Systems
IoT and GPS-based Vehicle Safety Systems
Infrared Target Detection Methodologies
Bayesian Modeling and Causal Inference
Gaze Tracking and Assistive Technology

Aero Engine Corporation of China (China)
2024

Hangzhou Dianzi University
2024

Fudan University
2020-2023

Rochester Institute of Technology
2022

Westlake University
2020-2022

Hong Kong Metropolitan University
2020

Beijing University of Posts and Telecommunications
2017

TF-Blender: Temporal Feature Blender for Video Object Detection

OPENALEX - Publications

Yiming Cui Liqi Yan Zhiwen Cao Dongfang Liu

Video objection detection is a challenging task because isolated video frames may encounter appearance deterioration, which introduces great confusion for detection. One of the popular solutions to exploit temporal information and enhance per-frame representation through aggregating features from neighboring frames. Despite achieving improvements in detection, existing methods focus on selection higher-level aggregation rather than modeling lower-level relations increase feature...

10.1109/iccv48922.2021.00803 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

DenserNet: Weakly Supervised Visual Localization Using Multi-Scale Feature Aggregation

OPENALEX - Publications

Dongfang Liu Yiming Cui Liqi Yan Christos Mousas Baijian Yang and 1 more

In this work, we introduce a Denser Feature Network(DenserNet) for visual localization. Our work provides three principal contributions. First, develop convolutional neural network (CNN) architecture which aggregates feature maps at different semantic levels image representations. Using denser maps, our method can produce more key point features and increase retrieval accuracy. Second, model is trained end-to-end without pixel-level an-notation other than positive negative GPS-tagged pairs....

10.1609/aaai.v35i7.16760 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Video Captioning Using Global-Local Representation

OPENALEX - Publications

Liqi Yan Siqi Ma Qifan Wang Yingjie Chen Xiangyu Zhang and 2 more

Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local vision representation for sentence generation, leaving plenty of room improvement. In this work, we approach the video from new perspective and propose GLR framework, namely granularity. Our demonstrates three advantages over prior efforts. First, simple solution, which exploits extensive...

10.1109/tcsvt.2022.3177320 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-05-23

Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework With Spatio-Temporal Collaboration

OPENALEX - Publications

Liqi Yan Qifan Wang Siqi Ma Jingang Wang Changbin Yu

Instance segmentation in videos, which aims to segment and track multiple objects video frames, has garnered a flurry of research attention recent years. In this paper, we present novel weakly supervised framework with \textbf{S}patio-\textbf{T}emporal \textbf{C}ollaboration for instance \textbf{Seg}mentation namely \textbf{STC-Seg}. Concretely, STC-Seg demonstrates four contributions. First, leverage the complementary representations from unsupervised depth estimation optical flow produce...

10.1109/tcsvt.2022.3202574 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-08-29

GL-RG: Global-Local Representation Granularity for Video Captioning

OPENALEX - Publications

Liqi Yan Qifan Wang Yiming Cui Fuli Feng Xiaojun Quan and 2 more

Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local representation across video frames for caption generation, leaving plenty of room improvement. In this work, we approach the from new perspective and propose GL-RG framework captioning, namely Global-Local Representation Granularity. Our demonstrates three advantages over prior efforts: 1)...

10.24963/ijcai.2022/384 article EN Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence 2022-07-01

Prompt Learns Prompt: Exploring Knowledge-Aware Generative Prompt Collaboration For Video Captioning

OPENALEX - Publications

Liqi Yan Cheng Han Zenglin Xu Dongfang Liu Qifan Wang

Fine-tuning large vision-language models is a challenging task. Prompt tuning approaches have been introduced to learn fixed textual or visual prompts while freezing the pre-trained model in downstream tasks. Despite effectiveness of prompt tuning, what do those learnable remains unexplained. In this work, we explore whether fine-tuning can knowledge-aware from pre-training, by designing two different sets pre-training and phases respectively. Specifically, present Video-Language (VL-Prompt)...

10.24963/ijcai.2023/180 article EN 2023-08-01

Statistically Data-Driven Operational Transfer Path Analysis

OPENALEX - Publications

Chao Song Wei Cheng Mingsui Yang Xuefeng Chen Liqi Yan and 5 more

10.2139/ssrn.5110279 preprint EN 2025-01-01

Gas Turbine Harmonic Detection and Modal Identification Based on Underdetermined Blind Source Separation

OPENALEX - Publications

Chao Song Jianxiong Hu Wei Cheng Bicheng Bo Mingsui Yang and 6 more

10.2139/ssrn.5153241 preprint EN 2025-01-01

Gas Turbine Harmonic Detection and Modal Identification Based on Underdetermined Blind Source Separation

OPENALEX - Publications

Chao Song Jianxiong Hu Wei Cheng Bicheng Bo Mingsui Yang and 6 more

10.2139/ssrn.5153240 preprint EN 2025-01-01

Quality Aware Operational Transfer Path Analysis for Gas Turbines

OPENALEX - Publications

Chao Song Wei Cheng Mingsui Yang Xuefeng Chen Liqi Yan and 4 more

10.2139/ssrn.5208577 preprint EN 2025-01-01

Hierarchical Attention Fusion for Geo-Localization

OPENALEX - Publications

Liqi Yan Yiming Cui Yingjie Chen Dongfang Liu

Geo-localization is a critical task in computer vision. In this work, we cast the geo-localization as 2D image retrieval task. Current state-of-the-art methods for are not robust to locate scene with drastic scale variations because they only exploit features from one semantic level representations. To address limitation, introduce hierarchical attention fusion network using multi-scale geo-localization. We extract feature maps convolutional neural (CNN) and organically fuse extracted Our...

10.1109/icassp39728.2021.9414517 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Multimodal Aggregation Approach for Memory Vision-Voice Indoor Navigation with Meta-Learning

OPENALEX - Publications

Liqi Yan Dongfang Liu Yaoxian Song Changbin Yu

Vision and voice are two vital keys for agents' interaction learning. In this paper, we present a novel indoor navigation model called Memory Vision-Voice Indoor Navigation (MVV-IN), which receives commands analyzes multimodal information of visual observation in order to enhance robots' environment understanding. We make use single RGB images taken by rst-view monocular camera. also apply self-attention mechanism keep the agent focusing on key areas. is important avoid repeating certain...

10.1109/iros45743.2020.9341398 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020-10-24

Single-tone Aerodynamic Noise Source Separation for Gas Turbines

OPENALEX - Publications

Wei Cheng Chao Song Bicheng Bo Shuang Chen Mingsui Yang and 5 more

10.1016/j.jsv.2024.118375 article EN Journal of Sound and Vibration 2024-03-04

Convolutive blind source separation in the frequency domain of mechanical noise for gas turbines based on bounded component analysis

OPENALEX - Publications

Wei Cheng Shuang Chen Chao Song Kai Ou Xuefeng Chen and 3 more

Abstract Noise source identification of gas turbines can provide the basis and guidance for vibration noise reduction turbines. Independent component analysis (ICA) is one most popular techniques blind separation (BSS) widely used in mechanical systems. ICA suitable independent signals. However, order to identify dependent sources turbines, a convolutive BSS frequency domain based on bounded (BCA) proposed. First, basic theory BCA introduced detail. The mixing time transformed into an...

10.1088/1361-6501/aca21a article EN Measurement Science and Technology 2022-11-11

DenserNet: Weakly Supervised Visual Localization Using Multi-scale Feature Aggregation

OPENALEX - Publications

Dongfang Liu Yiming Cui Liqi Yan Christos Mousas Baijian Yang and 1 more

In this work, we introduce a Denser Feature Network (DenserNet) for visual localization. Our work provides three principal contributions. First, develop convolutional neural network (CNN) architecture which aggregates feature maps at different semantic levels image representations. Using denser maps, our method can produce more keypoint features and increase retrieval accuracy. Second, model is trained end-to-end without pixel-level annotation other than positive negative GPS-tagged pairs....

10.48550/arxiv.2012.02366 preprint EN other-oa arXiv (Cornell University) 2020-01-01

GL-RG: Global-Local Representation Granularity for Video Captioning

OPENALEX - Publications

Liqi Yan Qifan Wang Yiming Cui Fuli Feng Xiaojun Quan and 2 more

Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local representation across video frames for caption generation, leaving plenty of room improvement. In this work, we approach the from new perspective and propose GL-RG framework captioning, namely \textbf{G}lobal-\textbf{L}ocal \textbf{R}epresentation \textbf{G}ranularity. Our demonstrates three...

10.48550/arxiv.2205.10706 preprint EN cc-by-nc-nd arXiv (Cornell University) 2022-01-01

TF-Blender: Temporal Feature Blender for Video Object Detection

OPENALEX - Publications

Yiming Cui Liqi Yan Zhiwen Cao Dongfang Liu

10.48550/arxiv.2108.05821 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Planar Reconstruction of Indoor Scenes from Sparse Views and Relative Camera Poses

OPENALEX - Publications

Fangli Guan Jiakang Liu Jianhui Zhang Liqi Yan Ling Jiang

Planar reconstruction detects planar segments and deduces their 3D parameters (normals offsets) from the input image; this has significant potential in fields of digital preservation cultural heritage, architectural design, robot navigation, intelligent transportation, security monitoring. Existing methods mainly employ multiple-view images with limited overlap for but lack utilization relative position rotation information between images. To fill gap, paper uses two views camera pose to...

10.3390/rs16091616 article EN cc-by Remote Sensing 2024-04-30

Driver Attention Prediction based on Multi-ScaleFeature Fusion

OPENALEX - Publications

Chengbin Yu Feng Jian-wen Liqi Yan Jianhui Zhang

<title>Abstract</title> Driver attention prediction plays a crucial role in the developing intelligent driving and assisted systems. However, this task presents several challenges to researchers, including difficulty of effectively utilizing scene information lack driver models that can accurately predict driver’s multiple regions fixation. To address these challenges, work proposes novel multi-scale feature fusion network (MSFFDAP) for prediction. MSFFDAP uses convolutional neural extract...

10.21203/rs.3.rs-4338143/v1 preprint EN cc-by Research Square (Research Square) 2024-05-06

Statistically Data-Driven Operational Transfer Path Analysis

OPENALEX - Publications

Chao Song Wei Cheng Mingsui Yang Xuefeng Chen Liqi Yan and 4 more

Conventional model-driven operational transfer path analysis (OTPA) cannot update and optimize itself based on data characteristics, which weakens its accuracy reliability. Inspired by data-driven thinking of learning from data, this paper develops statistically OTPA. First, considering the statistical distribution characteristics potential errors in according to central limit theorem, factors affecting error calculating transmissibility are analyzed summarized. Then, constructing objective...

10.2139/ssrn.4832763 preprint EN 2024-01-01

LAtt-Yolov8-seg: Video Real-time Instance Segmentation for Urban Street Scenes Based on Focused Linear Attention Mechanism

OPENALEX - Publications

Xinqi Zhang Tuo Dong Liqi Yan Zhenglei Yang Jianhui Zhang

Abstract: Recently, instance segmentation models with complex architectures and large parameter sets have shown impressive levels of precision. Nonetheless, considering a practical perspective, balancing precision speed is more desirable. Real-time faces efficiency quality challenges in urban street scenes. In the present research, we propose YOLOv8-seg based model named LAtt-Yolov8-seg. A pivotal advancement lies introduction mechanism called Focused Linear Attention, which effectively...

10.1145/3653804.3656278 article EN 2024-01-19

DroneGPT: Zero-shot Video Question Answering For Drones

OPENALEX - Publications

Hongjie Qiu Jinqiang Li Ji-Song Gan Shuwen Zheng Liqi Yan

With the continuous development and popularization of drone technology, drones are widely used in various fields, especially video applications. We propose DroneGPT, a neural-symbolic method that learns VISPROG, which does not require any task-specific training. It leverages contextual learning ability large language models to generate execute modular programs, solving complex compositional vision tasks given natural instructions. The modules program can call several ready-made computer...

10.1145/3653804.3654608 article EN 2024-01-19

Sewer-MoE: A tuned Mixture of Experts Model for Sewer Defect Classification

OPENALEX - Publications

Sijie Xu Yifan Gong X.F. Zhu Shuo Chang Y.C. Wang and 1 more

Abstract: Inspection of pipelines is particularly important for the drainage industry, and automation this process has received a lot attention. We propose Mixture Experts Sewer Defect Classification (Sewer-MoE), an innovative model identifying pipe defects, in which we train multiple expert models then merge them into single multiclassification model. During training process, produced attention mechanism structure that allows each to refer other models, while weighting classification...

10.1145/3653781.3653832 article EN 2024-01-19

Sparse Multi-Relational Graph Convolutional Network for Multi-type Object Trajectory Prediction

OPENALEX - Publications

Jianhui Zhang Jun Yao Liqi Yan Yanhong Xu Zheng Wang

Object trajectory prediction is a hot research issue with wide applications in video surveillance and autonomous driving. The previous studies consider the interaction sparsity mainly among pedestrians instead of multi-type objects, which brings new types interactions consequently superfluous ones. This paper proposes Multi-type Trajectory Prediction (MOTP) method Sparse Multi-relational Graph Convolutional Network (SMGCN) novel multi-round Global Temporal Aggregation (GTA). MOTP introduces...

10.24963/ijcai.2024/188 article EN 2024-07-26

Coming Soon ...