Shihong Xia

ORCID: 0000-0002-7228-9646
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Human Motion and Animation
  • Human Pose and Action Recognition
  • 3D Shape Modeling and Analysis
  • Advanced Vision and Imaging
  • Video Analysis and Summarization
  • Computer Graphics and Visualization Techniques
  • Face recognition and analysis
  • Advanced Numerical Analysis Techniques
  • Evacuation and Crowd Dynamics
  • Generative Adversarial Networks and Image Synthesis
  • Video Surveillance and Tracking Methods
  • Gait Recognition and Analysis
  • GaN-based semiconductor devices and materials
  • Simulation and Modeling Applications
  • Hand Gesture Recognition Systems
  • Ga2O3 and related materials
  • Image Processing and 3D Reconstruction
  • Optical measurement and interference techniques
  • Robotics and Sensor-Based Localization
  • Anomaly Detection Techniques and Applications
  • Optical Coatings and Gratings
  • Gaze Tracking and Assistive Technology
  • Speech and Audio Processing
  • Face and Expression Recognition
  • Advanced Image Processing Techniques

Institute of Computing Technology
2015-2024

Chinese Academy of Sciences
2014-2024

University of Chinese Academy of Sciences
2019-2024

Ningbo Institute of Industrial Technology
2022-2024

China National Petroleum Corporation (China)
2023-2024

Optica
2024

Ningbo University
2023

Minhang District Central Hospital
2021

Fudan University
2021

University of Cambridge
1994

3D geometric contents are becoming increasingly popular. In this paper, we study the problem of analyzing deforming meshes using deep neural networks. Deforming flexible to represent animation sequences as well collections objects same category, allowing diverse shapes with large-scale non-linear deformations. We propose a novel framework which call mesh variational autoencoders (mesh VAE), explore probabilistic latent space surfaces. The is easy train, and requires very few training...

10.1109/cvpr.2018.00612 article EN 2018-06-01

Predicting future motion based on historical sequence is a fundamental problem in computer vision, and it has wide applications autonomous driving robotics. Some recent works have shown that Graph Convolutional Networks(GCN) are instrumental modeling the relationship between different joints. However, considering variants diverse action types human data, cross-dependency of spatio-temporal relationships will be difficult to depict due decoupled strategy, which may also exacerbate...

10.1109/cvpr52688.2022.00634 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

This paper presents a novel solution for realtime generation of stylistic human motion that automatically transforms unlabeled, heterogeneous data into new styles. The key idea our approach is an online learning algorithm constructs series local mixtures autoregressive models (MAR) to capture the complex relationships between styles motion. We construct MAR on fly by searching closest examples each input pose in database. Once model parameters are estimated from training data, adapts current...

10.1145/2766999 article EN ACM Transactions on Graphics 2015-07-27

Hierarchical structure and different semantic roles of joints in human skeleton convey important information for action recognition. Conventional graph convolution methods modeling consider only physically connected neighbors each joint, the same type, thus failing to capture highorder information. In this work, we propose a novel model with motif-based encode hierarchical spatial structure, variable temporal dense block exploit local over ranges sequences. Moreover, employ non-local global...

10.1609/aaai.v33i01.33018989 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

Recent deep image-to-image translation techniques allow fast generation of face images from freehand sketches. However, existing solutions tend to overfit sketches, thus requiring professional sketches or even edge maps as input. To address this issue, our key idea is implicitly model the shape space plausible and synthesize a image in approximate an input sketch. We take local-to-global approach. first learn feature embeddings components, push corresponding parts towards underlying...

10.1145/3386569.3392386 article EN ACM Transactions on Graphics 2020-08-12

Transferring deformation from a source shape to target is very useful technique in computer graphics. State-of-the-art transfer methods require either point-wise correspondences between and shapes, or pairs of deformed shapes with corresponding deformations. However, most cases, such are not available cannot be reliably established using an automatic algorithm. Therefore, substantial user effort needed label the obtain specify sets. In this work, we propose novel approach two unpaired sets...

10.1145/3272127.3275028 article EN ACM Transactions on Graphics 2018-11-28

Example-based mesh deformation methods are powerful tools for realistic shape editing. However, existing techniques typically combine all the example modes, which can lead to overfitting, i.e., using an overly complicated model explain user-specified deformation. This leads implausible or unstable results, including unexpected global changes outside region of interest. To address this fundamental limitation, we propose a sparse blending method that automatically selects smaller number modes...

10.1109/tvcg.2019.2941200 article EN IEEE Transactions on Visualization and Computer Graphics 2019-09-17

Effectively characterizing the behavior of deformable objects has wide applicability but remains challenging. We present a new rotation-invariant deformation representation and novel reconstruction algorithm to accurately reconstruct positions local rotations simultaneously. Meshes can be very efficiently reconstructed from our by matrix pre-decomposition, while, at same time, hard or soft constraints flexibly specified with only handles needed. Our approach is thus particularly suitable for...

10.1145/2908736 article EN ACM Transactions on Graphics 2016-07-28

10.1007/s11390-017-1742-y article EN Journal of Computer Science and Technology 2017-05-01

Spatially localized deformation components are very useful for shape analysis and synthesis in 3D geometry processing. Several methods have recently been developed, with an aim to extract intuitive interpretable components. However, these techniques suffer from fundamental limitations especially meshes noise or large-scale deformations, may not always be able identify important components.In this paper we propose a novel mesh-based autoencoder architecture that is cope irregular topology. We...

10.1609/aaai.v32i1.11870 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2018-04-26

This paper introduces a new generative deep learning network for human motion synthesis and control. Our key idea is to combine recurrent neural networks (RNNs) adversarial training modeling. We first describe an efficient method RNN model from prerecorded data. implement RNNs with long short-term memory (LSTM) cells because they are capable of addressing the nonlinear dynamics term temporal dependencies present in motions. Next, we train refiner using loss, similar (GANs), such that refined...

10.1109/tvcg.2019.2938520 article EN IEEE Transactions on Visualization and Computer Graphics 2019-09-05

Recent works have achieved remarkable performance for action recognition with human skeletal data by utilizing graph convolutional models. Existing models mainly focus on developing operations to encode structural properties of a graph, whose topology is manually predefined and fixed over all samples. Some recent further take sample-dependent relationships among joints into consideration. However, the complex between arbitrary pairwise are difficult learn temporal features frames not fully...

10.1109/tpami.2022.3170511 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-04-26

Generating 3D human motion based on textual descriptions has been a research focus in recent years. It requires the generated to be diverse, natural, and conform description. Due complex spatio-temporal nature of difficulty learning cross-modal relationship between text motion, text-driven generation is still challenging problem. To address these issues, we propose AttT2M, two-stage method with multi-perspective attention mechanism: body-part global-local motion-text attention. The former...

10.1109/iccv51070.2023.00053 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Point clouds-based 3D human pose estimation that aims to recover the locations of skeleton joints plays an important role in many AR/VR applications. The success existing methods is generally built upon large scale data annotated with joints. However, it a labor-intensive and error-prone process annotate from input depth images or point clouds, due self-occlusion between body parts as well tedious annotation on clouds. Meanwhile, easier construct datasets 2D joint annotations images. To...

10.1109/tvcg.2020.2973076 article EN IEEE Transactions on Visualization and Computer Graphics 2020-02-13

Vision-based regression tasks, such as hand pose estimation, have achieved higher accuracy and faster convergence through representation learning. However, existing learning methods often encounter the following issues: high semantic level of features extracted from images is inadequate for regressing low-level information, include task-irrelevant reducing their compactness interfering with tasks. To address these challenges, we propose TI-Net, a highly versatile visual Network backbone...

10.48550/arxiv.2502.12535 preprint EN arXiv (Cornell University) 2025-02-17

This paper presents the first realtime 3D eye gaze capture method that simultaneously captures coordinated movement of gaze, head poses and facial expression deformation using a single RGB camera. Our key idea is to complement performance system with an efficient tracker. We start process by automatically detecting important 2D features for each frame. The detected are then used reconstruct large-scale multi-linear models. Next, we introduce novel user-independent classification extracting...

10.1145/2897824.2925947 article EN ACM Transactions on Graphics 2016-07-11

This paper presents a realtime and accurate method for 3D eye gaze tracking with monocular RGB camera. Our key idea is to train deep convolutional neural network(DCNN) that automatically extracts the iris pupil pixels of each from input images. To achieve this goal, we combine power Unet\cite{ronneberger2015u-net:} Squeezenet\cite{iandola2017squeezenet:} an efficient network pixel classification. In addition, track state in Maximum A Posteriori (MAP) framework, which sequentially searches...

10.1109/tvcg.2019.2938165 article EN IEEE Transactions on Visualization and Computer Graphics 2019-08-28

For the convenient reuse of large-scale 3D motion capture data, browsing and searching methods for data should be explored. In this paper, an efficient indexing retrieval approach human is presented based on a novel similarity metric. We divide character model into three partitions to reduce spatial complexity measure temporal each partition by self-organizing map Smith--Waterman algorithm. The overall between two clips can achieved integrating similarities separate body partitions. Then...

10.1145/1643928.1643974 article EN 2009-11-18

This paper proposes a novel visualization approach, which can depict the variations between different human motion data. is achieved by representing time dimension of each animation sequence with sequential curve in locality-preserving reference 2D space, called track representation. The principal advantage this representation over standard representations capture data - generally either keyframed timeline or map its entirety that it maps differences along into parallel perceptible spatial...

10.1109/pacificvis.2010.5429596 article EN 2010-03-01

Inverse dynamics is an important and challenging problem in human motion modeling, synthesis simulation, as well robotics biomechanics. Previous solutions to inverse are often noisy ambiguous particularly when double stances occur. In this paper, we present a novel method that accurately reconstructs biomechanically valid contact information, including center of pressure, forces, torsional torques internal joint from input kinematic data. Our key idea apply statistical modeling techniques...

10.1145/2980179.2982440 article EN ACM Transactions on Graphics 2016-11-11

Solid-state self-powered UV detection is strongly required in various application fields to enable long-term operation. However, this requirement incompatible with conventionally used metal-semiconductor-metal (MSM) photodetectors (PDs) due the symmetric design of Schottky contacts. In work, a MSM solar-blind UV-PD was realized using lateral pn junction architecture. A large built-in electric field obtained MSM-type without impurity doping, leading efficiency carrier separation and enhanced...

10.1364/ol.500391 article EN Optics Letters 2023-08-22

Style and variation are two vital components of human motion: style differentiates between examples the same behavior (slow walk vs. fast walk) while (vigorous lackadaisical arm swing). This paper presents a novel method to simultaneously model motion data captured from different subjects performing behavior. An articulated skeleton is separated into several joint groups, latent parameters introduced parameterize each partial motion. The relationships user-defined represented by Bayesian...

10.5555/1921427.1921431 article EN Symposium on Computer Animation 2010-07-02
Coming Soon ...