Tenglong Ao

ORCID: 0000-0002-7418-1014
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Human Pose and Action Recognition
  • Human Motion and Animation
  • Hand Gesture Recognition Systems
  • Multimodal Machine Learning Applications
  • Music and Audio Processing
  • Speech and dialogue systems
  • Robotics and Automated Systems
  • Video Analysis and Summarization
  • Hearing Impairment and Communication

Peking University
2022-2024

The automatic generation of stylized co-speech gestures has recently received increasing attention. Previous systems typically allow style control via predefined text labels or example motion clips, which are often not flexible enough to convey user intent accurately. In this work, we present GestureDiffuCLIP, a neural network framework for synthesizing realistic, with control. We leverage the power large-scale Contrastive-Language-Image-Pre-training (CLIP) model and novel CLIP-guided...

10.1145/3592097 article EN ACM Transactions on Graphics 2023-07-26

Automatic synthesis of realistic co-speech gestures is an increasingly important yet challenging task in artificial embodied agent creation. Previous systems mainly focus on generating end-to-end manner, which leads to difficulties mining the clear rhythm and semantics due complex subtle harmony between speech gestures. We present a novel gesture method that achieves convincing results both semantics. For rhythm, our system contains robust rhythm-based segmentation pipeline ensure temporal...

10.1145/3550454.3555435 article EN ACM Transactions on Graphics 2022-11-30

In this work, we present MoConVQ, a novel unified framework for physics-based motion control leveraging scalable discrete representations. Building upon vector quantized variational autoencoders (VQ-VAE) and model-based reinforcement learning, our approach effectively learns embeddings from large, unstructured dataset spanning tens of hours examples. The resultant representation not only captures diverse skills but also offers robust intuitive interface various applications. We demonstrate...

10.1145/3658137 article EN ACM Transactions on Graphics 2024-07-19

In this work, we present Semantic Gesticulator , a novel framework designed to synthesize realistic gestures accompanying speech with strong semantic correspondence. Semantically meaningful are crucial for effective non-verbal communication, but such often fall within the long tail of distribution natural human motion. The sparsity these movements makes it challenging deep learning-based systems, trained on moderately sized datasets, capture relationship between and corresponding semantics....

10.1145/3658134 article EN ACM Transactions on Graphics 2024-07-19

The automatic generation of stylized co-speech gestures has recently received increasing attention. Previous systems typically allow style control via predefined text labels or example motion clips, which are often not flexible enough to convey user intent accurately. In this work, we present GestureDiffuCLIP, a neural network framework for synthesizing realistic, with control. We leverage the power large-scale Contrastive-Language-Image-Pre-training (CLIP) model and novel CLIP-guided...

10.48550/arxiv.2303.14613 preprint EN other-oa arXiv (Cornell University) 2023-01-01

How to automatically synthesize natural-looking dance movements based on a piece of music is an incrementally popular yet challenging task. Most existing data-driven approaches require hard-to-get paired training data and fail generate long sequences motion due error accumulation autoregressive structure. We present novel 3D synthesis system that only needs unpaired for could realistic long-term motions at the same time. For training, we explore disentanglement beat style, propose...

10.48550/arxiv.2303.16856 preprint EN other-oa arXiv (Cornell University) 2023-01-01

In this work, we present Semantic Gesticulator, a novel framework designed to synthesize realistic gestures accompanying speech with strong semantic correspondence. Semantically meaningful are crucial for effective non-verbal communication, but such often fall within the long tail of distribution natural human motion. The sparsity these movements makes it challenging deep learning-based systems, trained on moderately sized datasets, capture relationship between and corresponding semantics....

10.48550/arxiv.2405.09814 preprint EN arXiv (Cornell University) 2024-05-16

In this work, we present MoConVQ, a novel unified framework for physics-based motion control leveraging scalable discrete representations. Building upon vector quantized variational autoencoders (VQ-VAE) and model-based reinforcement learning, our approach effectively learns embeddings from large, unstructured dataset spanning tens of hours examples. The resultant representation not only captures diverse skills but also offers robust intuitive interface various applications. We demonstrate...

10.48550/arxiv.2310.10198 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...