- Human Pose and Action Recognition
- Multimodal Machine Learning Applications
- Medical Image Segmentation Techniques
- Advanced Image and Video Retrieval Techniques
- Advanced Image Processing Techniques
- AI in cancer detection
- Advanced Neural Network Applications
- Gait Recognition and Analysis
- Image Retrieval and Classification Techniques
- Anomaly Detection Techniques and Applications
- Hand Gesture Recognition Systems
- Tactile and Sensory Interactions
- Image and Signal Denoising Methods
- Radiomics and Machine Learning in Medical Imaging
- Advanced optical system design
- Acupuncture Treatment Research Studies
- Infrastructure Maintenance and Monitoring
- Human Motion and Animation
- Robotic Path Planning Algorithms
- Sparse and Compressive Sensing Techniques
- Adversarial Robustness in Machine Learning
- Vehicle Routing Optimization Methods
- 3D Surveying and Cultural Heritage
- Semantic Web and Ontologies
- Model Reduction and Neural Networks
Zhejiang Chinese Medical University
2025
Fudan University
2023-2025
Hong Kong University of Science and Technology
2023-2025
University of Hong Kong
2023-2025
Shanghai University
2024
Karlsruhe Institute of Technology
2023-2024
Harbin Institute of Technology
2024
Decision Systems (United States)
2015
Massachusetts Institute of Technology
2015
This paper presents a multiagent path planning algorithm based on sequential convex programming (SCP) that finds locally optimal trajectories. Previous work using SCP efficiently computes motion plans in spaces with no static obstacles. In many scenarios where the are non-convex, previous SCP-based algorithms failed to find feasible solutions because approximation of collision constraints leads forming sequence infeasible optimization problems. addresses this problem by tightening...
High-resolution medical images are of critical significance to improve disease diagnosis. Limited by the camera and power devices, often have very low resolution. For example, wireless capsule endoscopes, used diagnose diseases small bowel, can only capture low-resolution endoscopic images. The existing super-resolution (SR) networks perform exceptionally well in recovering high-resolution images, but they computationally expensive require high bandwidth, which result unacceptable latency...
Grounded Situation Recognition (GSR) is capable of recognizing and interpreting visual scenes in a contextually intuitive way, yielding salient activities (verbs) the involved entities (roles) depicted images. In this work, we focus on application GSR assisting people with impairments (PVI). However, precise localization information detected objects often required to navigate their surroundings confidently make informed decisions. For first time, propose an Open Scene Understanding (OpenSU)...
As the core component of all-solid-state batteries, current solid-state electrolytes fail to simultaneously meet multiple demands, such as their own high performance, chemical, electrochemical and mechanical compatibility electrode interface. A fresh perspective is rather desired guide development novel solid with comprehensive performance. Herein, this work proposes a strategy synthesize extracted from cathode-electrolyte interphase (CEI), which inspired by peach trees secreting gum prevent...
Alzheimer's disease (AD) is characterized by cognitive impairment and behavioral impairment. The gait of AD patients attracting the increasing attention. aim this randomized controlled trial (RCT) to explore effect acupuncture on function, performance, hemodynamic changes in prefrontal cortices. In RCT, a total 108 will be randomly assigned into group or control for 8 weeks. primary outcome three-dimensional analysis cerebral hemodynamics using functional near-infrared spectroscopy (fNIRS)....
When reading a document, glancing at the spatial layout of document is an initial step to understand it roughly. Traditional analysis (DLA) methods, however, offer only superficial parsing documents, focusing on basic instance detection and often failing capture nuanced logical relations between instances. These limitations hinder DLA-based models from achieving gradually deeper comprehension akin human reading. In this work, we propose novel graph-based Document Structure Analysis (gDSA)...
Introduction Alzheimer’s disease (AD) represents a degenerative condition affecting the nervous system, characterized by absence of definitive cause and lack precise therapeutic intervention. Extensive research efforts are being conducted worldwide to enhance early detection methods for AD develop medications capable effectively halting initiation progression during its stages. Some current diagnosis expensive require invasive procedures. More more evidence shows that gait is related...
Blind Face Super-Resolution (BFSR) has recently gained widespread attention, which aims to super-resolve Low-Resolution (LR) face images with complex unknown degradation High-Resolution (HR) images. However, existing BFSR methods suffer from two major limitations. First, most of them are trained on synthetic data pairs pre-defined models, leads poor performance due the mismatch between other degradations in real-world scenarios. Second, some rely hand-crafted priors as constraints, such...
Medical image segmentation is a fundamental task in the community of medical analysis. In this paper, novel network architecture, referred to as Convolution, Transformer, and Operator (CTO), proposed. CTO employs combination Convolutional Neural Networks (CNNs), Vision Transformer (ViT), an explicit boundary detection operator achieve high recognition accuracy while maintaining optimal balance between efficiency. The proposed follows standard encoder-decoder paradigm, where encoder...
The ability to animate photo-realistic head avatars reconstructed from monocular portrait video sequences represents a crucial step in bridging the gap between virtual and real worlds. Recent advancements avatar techniques, including explicit 3D morphable meshes (3DMM), point clouds, neural implicit representation have been exploited for this ongoing research. However, 3DMM-based methods are constrained by their fixed topologies, point-based approaches suffer heavy training burden due...
Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, modality incompleteness multi-modal segmentation remains under-explored. In this work, we establish task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level absence sensor-level errors. To avoid predominant reliance fusion, introduce Missing-aware Modal Switch...
Self-supervised representation learning for human action recognition has developed rapidly in recent years. Most of the existing works are based on skeleton data while using a multi-modality setup. These overlooked differences performance among modalities, which led to propagation erroneous knowledge between modalities only three fundamental i.e., joints, bones, and motions used, hence no additional explored.In this work, we first propose an Implicit Knowledge Exchange Module (IKEM)...
Understanding human actions from body poses is critical for assistive robots sharing space with humans in order to make informed and safe decisions about the next interaction. However, precise temporal localization annotation of activity sequences time-consuming resulting labels are often noisy. If not effectively addressed, label noise negatively affects model's training, lower recognition quality. Despite its importance, addressing skeleton-based action has been overlooked so far. In this...
Panoramic images, capturing a 360{\deg} field of view (FoV), encompass omnidirectional spatial information crucial for scene understanding. However, it is not only costly to obtain training-sufficient dense-annotated panoramas but also application-restricted when training models in close-vocabulary setting. To tackle this problem, work, we define new task termed Open Segmentation (OPS), where are trained with FoV-restricted pinhole images the source domain an open-vocabulary setting while...
In Open-Set Domain Generalization (OSDG), the model is exposed to both new variations of data appearance (domains) and open-set conditions, where known novel categories are present at test time. The challenges this task arise from dual need generalize across diverse domains accurately quantify category novelty, which critical for applications in dynamic environments. Recently, meta-learning techniques have demonstrated superior results OSDG, effectively orchestrating meta-train -test tasks...
Open-Set Domain Generalization (OSDG) is a challenging task requiring models to accurately predict familiar categories while minimizing confidence for unknown effectively reject them in unseen domains. While the OSDG field has seen considerable advancements, impact of label noise--a common issue real-world datasets--has been largely overlooked. Label noise can mislead model optimization, thereby exacerbating challenges open-set recognition novel In this study, we take first step towards...
Grounded Situation Recognition (GSR) is capable of recognizing and interpreting visual scenes in a contextually intuitive way, yielding salient activities (verbs) the involved entities (roles) depicted images. In this work, we focus on application GSR assisting people with impairments (PVI). However, precise localization information detected objects often required to navigate their surroundings confidently make informed decisions. For first time, propose an Open Scene Understanding (OpenSU)...
Self-supervised representation learning for human action recognition has developed rapidly in recent years. Most of the existing works are based on skeleton data while using a multi-modality setup. These overlooked differences performance among modalities, which led to propagation erroneous knowledge between modalities only three fundamental i.e., joints, bones, and motions used, hence no additional explored. In this work, we first propose an Implicit Knowledge Exchange Module (IKEM)...