Hongsheng Li

ORCID: 0000-0002-9929-4023
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Human Pose and Action Recognition
  • Multimodal Machine Learning Applications
  • Mobile Ad Hoc Networks
  • Energy Efficient Wireless Sensor Networks
  • Gait Recognition and Analysis
  • Anomaly Detection Techniques and Applications
  • Video Surveillance and Tracking Methods
  • Wireless Networks and Protocols
  • Topic Modeling
  • Advanced Image and Video Retrieval Techniques
  • Advanced Vision and Imaging
  • Advanced Neural Network Applications
  • Explainable Artificial Intelligence (XAI)
  • Mobile Agent-Based Network Management
  • Advanced Graph Neural Networks
  • Domain Adaptation and Few-Shot Learning
  • Blind Source Separation Techniques
  • Image and Signal Denoising Methods
  • Wireless Communication Networks Research
  • Advanced Algorithms and Applications
  • Medical Image Segmentation Techniques
  • Medical Imaging Techniques and Applications
  • Advanced Computing and Algorithms
  • Graph Theory and Algorithms
  • Language and cultural evolution

Xuzhou University of Technology
2025

University of Hong Kong
2024

Wuhan University of Technology
2002-2024

Chinese University of Hong Kong
2024

Sinopec (China)
2024

Xidian University
2020-2023

Shandong First Medical University
2010-2016

Shandong Tumor Hospital
2010-2016

Xi'an Railway Survey and Design Institute
2012

Northwestern Polytechnical University
2003

10.1109/cvpr52733.2024.01432 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Deep learning techniques have led to remarkable breakthroughs in the field of object detection and spawned a lot scene-understanding tasks recent years. Scene graph has been focus research because its powerful semantic representation applications scene understanding. Graph Generation (SGG) refers task automatically mapping an image or video into structural graph, which requires correct labeling detected objects their relationships. In this paper, comprehensive survey achievements is...

10.1016/j.neucom.2023.127052 article EN cc-by Neurocomputing 2023-11-20

Various methods to deal with graph data have been proposed in recent years. However, most of these focus on feature aggregation rather than pooling. Besides, the existing top-k selection pooling a few problems. First, construct pooled topology, current evaluate importance node from single perspective only, which is simplistic and unobjective. Second, information unselected nodes directly lost during process, inevitably leads massive loss information. To solve problems mentioned above, we...

10.1145/3366423.3380083 preprint EN 2020-04-20

Abstract Aiming at the problems of many path inflection points, unsmooth paths, and poor local obstacle avoidance in planning inspection robots static-dynamic scenes under complex geological conditions coal mine roadways, a hybrid method based on improved A* algorithm dynamic window approach (DWA) is proposed. First, robot platform system model are constructed. An heuristic function that incorporates target weight information proposed global algorithm. Additionally, redundant nodes...

10.1017/s0263574725000037 article EN Robotica 2025-01-30

Recently, Large Language Models (LLMs) and Multimodal (MLLMs) have shown promise in instruction following image understanding. While these models are powerful, they not yet been developed to comprehend the more challenging 3D geometric physical scenes, especially when it comes sparse outdoor LiDAR data. In this paper, we introduce LiDAR-LLM, which takes raw data as input harnesses remarkable reasoning capabilities of LLMs gain a comprehensive understanding scenes. The central insight our...

10.1609/aaai.v39i9.33001 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

10.1109/cvpr52733.2024.01447 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

For a given video-based Human-Object Interaction scene, modeling the spatio-temporal relationship between humans and objects is important cue to understand contextual information presented in video. With efficient modeling, it possible not only uncover each frame, but directly capture inter-frame dependencies as well. Capturing position changes of human over dimension more critical when significant appearance features may occur time. When utilizing features, spatial location semantic are...

10.1145/3474085.3475636 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

Video-based human-object interaction recognition is a challenging task since the state of objects as well their correlations change constantly in video. Existing methods mainly use 3DCNN or separate components (e.g., GCN + RNN) to model spatial correlation temporal respectively, but ignore modeling spatio-temporal simultaneously and long-term dynamics objects. In this paper, we propose novel model, named Spatio-Temporal Interaction Graph Parsing Networks (STIGPN), for videos. STIGPN captures...

10.1109/tcsvt.2023.3259430 article EN IEEE Transactions on Circuits and Systems for Video Technology 2023-03-20

Node power management is one of the key problems in wireless sensor networks. This paper proposes a new method by using genetic algorithm which has characteristic auto-adapted global optimization probability searching. Under condition connectivity between nodes, this can calculate optimum route link from source node to destination entire network, thus reduces quantity communication nodes as well network power. The simulation results indicated that be applied perfectly and effect energy...

10.1109/icnds.2010.5479596 article EN 2010-05-01

Flowcharts and mind maps, collectively known as flowmind, are vital in daily activities, with hand-drawn versions facilitating real-time collaboration. However, there's a growing need to digitize them for efficient processing. Automated conversion methods essential overcome manual challenges. Existing sketch recognition face limitations practical situations, being field-specific lacking digital steps. Our paper introduces the Flowmind2digital method hdFlowmind dataset address these...

10.48550/arxiv.2401.03742 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Predicting the future motion of surrounding agents is essential for autonomous vehicles (AVs) to operate safely in dynamic, human-robot-mixed environments. Context information, such as road maps and agents' states, provides crucial geometric semantic information behavior prediction. To this end, recent works explore two-stage prediction frameworks where coarse trajectories are first proposed, then used select critical context trajectory refinement. However, they either incur a large amount...

10.48550/arxiv.2403.11492 preprint EN arXiv (Cornell University) 2024-03-18

Neuromorphic sensors, specifically event cameras, revolutionize visual data acquisition by capturing pixel intensity changes with exceptional dynamic range, minimal latency, and energy efficiency, setting them apart from conventional frame-based cameras. The distinctive capabilities of cameras have ignited significant interest in the domain event-based action recognition, recognizing their vast potential for advancement. However, development this field is currently slowed lack comprehensive,...

10.48550/arxiv.2407.05106 preprint EN arXiv (Cornell University) 2024-07-06

Heterogeneous phase combined flooding (HPCF) has been a promising technology used for enhancing oil recovery in heterogeneous mature reservoirs. However, the injectivity and propagation behavior of preformed particle gel (PPG) low–medium-permeability reservoir porous media is crucial HPCF treatment reservoir. Thus, were systematically studied by conducting series sand pack experiments. The matching factor (δ) was defined as ratio average size PPG particles to mean pore throats pressure...

10.3390/gels10070475 article EN cc-by Gels 2024-07-18

Photo-realistic and controllable 3D avatars are crucial for various applications such as virtual mixed reality (VR/MR), telepresence, gaming, film production. Traditional methods avatar creation often involve time-consuming scanning reconstruction processes each avatar, which limits their scalability. Furthermore, these do not offer the flexibility to sample new identities or modify existing ones. On other hand, by learning a strong prior from data, generative models provide promising...

10.48550/arxiv.2408.13674 preprint EN arXiv (Cornell University) 2024-08-24

Predicting the future motion of surrounding agents is essential for autonomous vehicles (AVs) to operate safely in dynamic, human-robot-mixed environments. However, scarcity large-scale driving datasets has hindered development robust and generalizable prediction models, limiting their ability capture complex interactions road geometries. Inspired by recent advances natural language processing (NLP) computer vision (CV), self-supervised learning (SSL) gained significant attention community...

10.48550/arxiv.2410.08669 preprint EN arXiv (Cornell University) 2024-10-11

10.1109/itnec60942.2024.10733299 article EN 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC) 2024-09-20

Video face swapping is becoming increasingly popular across various applications, yet existing methods primarily focus on static images and struggle with video because of temporal consistency complex scenarios. In this paper, we present the first diffusion-based framework specifically designed for swapping. Our approach introduces a novel image-video hybrid training that leverages both abundant image data sequences, addressing inherent limitations video-only training. The incorporates...

10.48550/arxiv.2412.11279 preprint EN arXiv (Cornell University) 2024-12-15
Coming Soon ...