Liqi Yan

ORCID: 0000-0002-7077-4947
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Robotics and Sensor-Based Localization
  • Advanced Image and Video Retrieval Techniques
  • Human Pose and Action Recognition
  • Vehicle Noise and Vibration Control
  • Domain Adaptation and Few-Shot Learning
  • Advanced Neural Network Applications
  • Machine Fault Diagnosis Techniques
  • Advanced Vision and Imaging
  • 3D Surveying and Cultural Heritage
  • Video Analysis and Summarization
  • Video Surveillance and Tracking Methods
  • Advanced Measurement and Detection Methods
  • Advanced Algorithms and Applications
  • Acoustic Wave Phenomena Research
  • Cancer-related molecular mechanisms research
  • Visual Attention and Saliency Detection
  • Aerodynamics and Acoustics in Jet Flows
  • Turbomachinery Performance and Optimization
  • Structural Health Monitoring Techniques
  • IoT-based Smart Home Systems
  • IoT and GPS-based Vehicle Safety Systems
  • Infrared Target Detection Methodologies
  • Bayesian Modeling and Causal Inference
  • Gaze Tracking and Assistive Technology

Aero Engine Corporation of China (China)
2024

Hangzhou Dianzi University
2024

Fudan University
2020-2023

Rochester Institute of Technology
2022

Westlake University
2020-2022

Hong Kong Metropolitan University
2020

Beijing University of Posts and Telecommunications
2017

Video objection detection is a challenging task because isolated video frames may encounter appearance deterioration, which introduces great confusion for detection. One of the popular solutions to exploit temporal information and enhance per-frame representation through aggregating features from neighboring frames. Despite achieving improvements in detection, existing methods focus on selection higher-level aggregation rather than modeling lower-level relations increase feature...

10.1109/iccv48922.2021.00803 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

In this work, we introduce a Denser Feature Network(DenserNet) for visual localization. Our work provides three principal contributions. First, develop convolutional neural network (CNN) architecture which aggregates feature maps at different semantic levels image representations. Using denser maps, our method can produce more key point features and increase retrieval accuracy. Second, model is trained end-to-end without pixel-level an-notation other than positive negative GPS-tagged pairs....

10.1609/aaai.v35i7.16760 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local vision representation for sentence generation, leaving plenty of room improvement. In this work, we approach the video from new perspective and propose GLR framework, namely granularity. Our demonstrates three advantages over prior efforts. First, simple solution, which exploits extensive...

10.1109/tcsvt.2022.3177320 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-05-23

Instance segmentation in videos, which aims to segment and track multiple objects video frames, has garnered a flurry of research attention recent years. In this paper, we present novel weakly supervised framework with \textbf{S}patio-\textbf{T}emporal \textbf{C}ollaboration for instance \textbf{Seg}mentation namely \textbf{STC-Seg}. Concretely, STC-Seg demonstrates four contributions. First, leverage the complementary representations from unsupervised depth estimation optical flow produce...

10.1109/tcsvt.2022.3202574 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-08-29

Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local representation across video frames for caption generation, leaving plenty of room improvement. In this work, we approach the from new perspective and propose GL-RG framework captioning, namely Global-Local Representation Granularity. Our demonstrates three advantages over prior efforts: 1)...

10.24963/ijcai.2022/384 article EN Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence 2022-07-01

Fine-tuning large vision-language models is a challenging task. Prompt tuning approaches have been introduced to learn fixed textual or visual prompts while freezing the pre-trained model in downstream tasks. Despite effectiveness of prompt tuning, what do those learnable remains unexplained. In this work, we explore whether fine-tuning can knowledge-aware from pre-training, by designing two different sets pre-training and phases respectively. Specifically, present Video-Language (VL-Prompt)...

10.24963/ijcai.2023/180 article EN 2023-08-01

Geo-localization is a critical task in computer vision. In this work, we cast the geo-localization as 2D image retrieval task. Current state-of-the-art methods for are not robust to locate scene with drastic scale variations because they only exploit features from one semantic level representations. To address limitation, introduce hierarchical attention fusion network using multi-scale geo-localization. We extract feature maps convolutional neural (CNN) and organically fuse extracted Our...

10.1109/icassp39728.2021.9414517 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Vision and voice are two vital keys for agents' interaction learning. In this paper, we present a novel indoor navigation model called Memory Vision-Voice Indoor Navigation (MVV-IN), which receives commands analyzes multimodal information of visual observation in order to enhance robots' environment understanding. We make use single RGB images taken by rst-view monocular camera. also apply self-attention mechanism keep the agent focusing on key areas. is important avoid repeating certain...

10.1109/iros45743.2020.9341398 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020-10-24

Abstract Noise source identification of gas turbines can provide the basis and guidance for vibration noise reduction turbines. Independent component analysis (ICA) is one most popular techniques blind separation (BSS) widely used in mechanical systems. ICA suitable independent signals. However, order to identify dependent sources turbines, a convolutive BSS frequency domain based on bounded (BCA) proposed. First, basic theory BCA introduced detail. The mixing time transformed into an...

10.1088/1361-6501/aca21a article EN Measurement Science and Technology 2022-11-11

In this work, we introduce a Denser Feature Network (DenserNet) for visual localization. Our work provides three principal contributions. First, develop convolutional neural network (CNN) architecture which aggregates feature maps at different semantic levels image representations. Using denser maps, our method can produce more keypoint features and increase retrieval accuracy. Second, model is trained end-to-end without pixel-level annotation other than positive negative GPS-tagged pairs....

10.48550/arxiv.2012.02366 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local representation across video frames for caption generation, leaving plenty of room improvement. In this work, we approach the from new perspective and propose GL-RG framework captioning, namely \textbf{G}lobal-\textbf{L}ocal \textbf{R}epresentation \textbf{G}ranularity. Our demonstrates three...

10.48550/arxiv.2205.10706 preprint EN cc-by-nc-nd arXiv (Cornell University) 2022-01-01

Video objection detection is a challenging task because isolated video frames may encounter appearance deterioration, which introduces great confusion for detection. One of the popular solutions to exploit temporal information and enhance per-frame representation through aggregating features from neighboring frames. Despite achieving improvements in detection, existing methods focus on selection higher-level aggregation rather than modeling lower-level relations increase feature...

10.48550/arxiv.2108.05821 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Planar reconstruction detects planar segments and deduces their 3D parameters (normals offsets) from the input image; this has significant potential in fields of digital preservation cultural heritage, architectural design, robot navigation, intelligent transportation, security monitoring. Existing methods mainly employ multiple-view images with limited overlap for but lack utilization relative position rotation information between images. To fill gap, paper uses two views camera pose to...

10.3390/rs16091616 article EN cc-by Remote Sensing 2024-04-30

<title>Abstract</title> Driver attention prediction plays a crucial role in the developing intelligent driving and assisted systems. However, this task presents several challenges to researchers, including difficulty of effectively utilizing scene information lack driver models that can accurately predict driver’s multiple regions fixation. To address these challenges, work proposes novel multi-scale feature fusion network (MSFFDAP) for prediction. MSFFDAP uses convolutional neural extract...

10.21203/rs.3.rs-4338143/v1 preprint EN cc-by Research Square (Research Square) 2024-05-06

Conventional model-driven operational transfer path analysis (OTPA) cannot update and optimize itself based on data characteristics, which weakens its accuracy reliability. Inspired by data-driven thinking of learning from data, this paper develops statistically OTPA. First, considering the statistical distribution characteristics potential errors in according to central limit theorem, factors affecting error calculating transmissibility are analyzed summarized. Then, constructing objective...

10.2139/ssrn.4832763 preprint EN 2024-01-01

Abstract: Recently, instance segmentation models with complex architectures and large parameter sets have shown impressive levels of precision. Nonetheless, considering a practical perspective, balancing precision speed is more desirable. Real-time faces efficiency quality challenges in urban street scenes. In the present research, we propose YOLOv8-seg based model named LAtt-Yolov8-seg. A pivotal advancement lies introduction mechanism called Focused Linear Attention, which effectively...

10.1145/3653804.3656278 article EN 2024-01-19

With the continuous development and popularization of drone technology, drones are widely used in various fields, especially video applications. We propose DroneGPT, a neural-symbolic method that learns VISPROG, which does not require any task-specific training. It leverages contextual learning ability large language models to generate execute modular programs, solving complex compositional vision tasks given natural instructions. The modules program can call several ready-made computer...

10.1145/3653804.3654608 article EN 2024-01-19

Abstract: Inspection of pipelines is particularly important for the drainage industry, and automation this process has received a lot attention. We propose Mixture Experts Sewer Defect Classification (Sewer-MoE), an innovative model identifying pipe defects, in which we train multiple expert models then merge them into single multiclassification model. During training process, produced attention mechanism structure that allows each to refer other models, while weighting classification...

10.1145/3653781.3653832 article EN 2024-01-19

Object trajectory prediction is a hot research issue with wide applications in video surveillance and autonomous driving. The previous studies consider the interaction sparsity mainly among pedestrians instead of multi-type objects, which brings new types interactions consequently superfluous ones. This paper proposes Multi-type Trajectory Prediction (MOTP) method Sparse Multi-relational Graph Convolutional Network (SMGCN) novel multi-round Global Temporal Aggregation (GTA). MOTP introduces...

10.24963/ijcai.2024/188 article EN 2024-07-26
Coming Soon ...