- Human Pose and Action Recognition
- Face recognition and analysis
- Advanced Vision and Imaging
- Generative Adversarial Networks and Image Synthesis
- Video Surveillance and Tracking Methods
- 3D Shape Modeling and Analysis
- Anomaly Detection Techniques and Applications
- Facial Nerve Paralysis Treatment and Research
- Advanced Image Processing Techniques
- Human Motion and Animation
- Robotics and Sensor-Based Localization
- Computer Graphics and Visualization Techniques
- IoT Networks and Protocols
- Wireless Body Area Networks
- Advanced Wireless Network Optimization
- Facial Rejuvenation and Surgery Techniques
- Domain Adaptation and Few-Shot Learning
- Energy Efficient Wireless Sensor Networks
- Advanced MIMO Systems Optimization
- Hand Gesture Recognition Systems
- Cooperative Communication and Network Coding
- Biometric Identification and Security
- Advanced Computing and Algorithms
- IoT and Edge/Fog Computing
- Multimodal Machine Learning Applications
National Taiwan Normal University
2023-2024
Meta (United States)
2019-2022
META Health
2022
Meta (Israel)
2018-2021
Carnegie Mellon University
2014-2017
Institute of Information Science, Academia Sinica
2014
National Taiwan University
2012-2013
We present an approach to efficiently detect the 2D pose of multiple people in image. The uses a nonparametric representation, which we refer as Part Affinity Fields (PAFs), learn associate body parts with individuals architecture encodes global context, allowing greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance, irrespective number is designed jointly part locations and their association via two branches same sequential prediction process. Our...
Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people images and videos. In this work, we present realtime approach detect the multiple image. The proposed method uses nonparametric representation, which refer as Part Affinity Fields (PAFs), learn associate body parts with individuals This bottom-up system achieves high accuracy performance, regardless number previous PAFs part location were refined simultaneously across training...
Pose Machines provide a sequential prediction framework for learning rich implicit spatial models. In this work we show systematic design how convolutional networks can be incorporated into the pose machine image features and image-dependent models task of estimation. The contribution paper is to implicitly model long-range dependencies between variables in structured tasks such as articulated We achieve by designing architecture composed that directly operate on belief maps from previous...
Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people images and videos. In this work, we present realtime approach detect the multiple image. The proposed method uses nonparametric representation, which refer as Part Affinity Fields (PAFs), learn associate body parts with individuals This bottom-up system achieves high accuracy performance, regardless number previous PAFs part location were refined simultaneously across training...
Pose Machines provide a sequential prediction framework for learning rich implicit spatial models. In this work we show systematic design how convolutional networks can be incorporated into the pose machine image features and image-dependent models task of estimation. The contribution paper is to implicitly model long-range dependencies between variables in structured tasks such as articulated We achieve by designing architecture composed that directly operate on belief maps from previous...
In this paper, we present supervision-by-registration, an unsupervised approach to improve the precision of facial landmark detectors on both images and video. Our key observation is that detections same in adjacent frames should be coherent with registration, i.e., optical flow. Interestingly, coherency flow a source supervision does not require manual labeling, can leveraged during detector training. For example, enforce training loss function detected at frame <sub...
Accurate estimation of 3D human motion from monocular video requires modeling both kinematics (body without physical forces) and dynamics (motion with forces). To demonstrate this, we present SimPoE, a Simulation-based approach for Pose Estimation, which integrates image-based kinematic inference physics-based modeling. SimPoE learns policy that takes as input the current-frame pose estimate next image frame to control physically-simulated character output next-frame estimate. The contains...
Creating photorealistic avatars of existing people currently requires extensive person-specific data capture, which is usually only accessible to the VFX industry and not general public. Our work aims address this drawback by relying on a short mobile phone capture obtain drivable 3D head avatar that matches person's likeness faithfully. In contrast approaches, our architecture avoids complex task directly modeling entire manifold human appearance, aiming instead generate an model can be...
A key promise of Virtual Reality (VR) is the possibility remote social interaction that more immersive than any prior telecommunication media. However, existing VR experiences are mediated by inauthentic digital representations user (i.e., stylized avatars). These have limited adoption applications in precisely those cases where immersion most necessary (e.g., professional interactions and intimate conversations). In this work, we present a bidirectional system can animate avatar heads both...
We present an approach to efficiently detect the 2D pose of multiple people in image. The uses a nonparametric representation, which we refer as Part Affinity Fields (PAFs), learn associate body parts with individuals architecture encodes global context, allowing greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance, irrespective number is designed jointly part locations and their association via two branches same sequential prediction process. Our...
We present a method for building high-fidelity animatable 3D face models that can be posed and rendered with novel lighting environments in real-time. Our main insight is relightable trained to produce an image lit from single light direction generalize natural illumination conditions but are computationally expensive render. On the other hand, efficient, point-light data do not conditions. leverage strengths of each these two approaches. first train generalizable model on illuminations, use...
We present a learning-based method for building driving-signal aware full-body avatars. Our model is conditional variational autoencoder that can be animated with incomplete driving signals, such as human pose and facial keypoints, produces high-quality representation of geometry view-dependent appearance. The core intuition behind our better drivability generalization achieved by disentangling the signals remaining generative factors, which are not available during animation. To this end,...
Photorealistic telepresence requires both high-fidelity body modeling and faithful driving to enable dynamically synthesized appearance that is indistinguishable from reality. In this work, we propose an end-to-end framework addresses two core challenges in full-body avatars of real people. One challenge avatar while staying details dynamics cannot be captured by a global low-dimensional parameterization such as pose. Our approach supports clothed with wrinkles motion performer exhibits...
Interacting with people across large distances is important for remote work, interpersonal relationships, and entertainment. While such face-to-face interactions can be achieved using 2D video conferencing or, more recently, virtual reality (VR), telepresence systems currently distort the communication of eye contact social gaze signals. Although methods have been proposed to redirect in teleconferencing situations enable contact, lacks 3D immersion real life. To address these problems, we...
We present supervision by registration and triangulation (SRT), an unsupervised approach that utilizes unlabeled multi-view video to improve the accuracy precision of landmark detectors. Being able utilize data enables our detectors learn from massive amounts freely available not be limited quality quantity manual human annotations. To data, there are two key observations: (I) The detections same in adjacent frames should coherent with registration, i.e., optical flow. (II) multiple...
In this paper, we present supervision-by-registration, an unsupervised approach to improve the precision of facial landmark detectors on both images and video. Our key observation is that detections same in adjacent frames should be coherent with registration, i.e., optical flow. Interestingly, coherency flow a source supervision does not require manual labeling, can leveraged during detector training. For example, enforce training loss function detected at frame$_{t-1}$ followed by tracking...
Clustered communication has been considered as one key technology for supporting machine-to-machine (M2M) wireless networks with a large number of communicating devices. Unlike related work that focuses on clustering simple or no interference model at the physical layer, in this paper we investigate optimization problem cluster formation and power control interference-limited M2M communications. We consider scenario where machines form clusters are allowed to reuse spectrum occupied by human...
Femtocell technology has shifted beyond indoor residential applications to cover a wider range of scenarios including metropolitan and rural areas. The term "small cell" hence been used denote such low-power transmission points deployed for enhancing macrocell coverage and/or capacity. While deployment femto BSs typically followed the bottom-up paradigm driven by ad hoc demand users, more studies have prompted move toward managed model better tradeoff between performance cost. In this paper,...
Social presence, the feeling of being there with a "real" person, will fuel next generation communication systems driven by digital humans in virtual reality (VR). The best 3D video-realistic VR avatars that minimize uncanny effect rely on person-specific (PS) models. However, these PS models are time-consuming to build and typically trained limited data variability, which results poor generalization robustness. Major sources variability affects accuracy facial expression transfer algorithms...
While clustered communication has been considered as one key technology for supporting machine-to-machine (M2M) wireless networks, existing clustering techniques have predominantly designed with the objectives of maximizing service quality individual machines. Many M2M applications, however, are characterized by large amount correlated data to transport, and hence "machine-centric" fail effectively address "big data" problem introduced these applications. In this paper, we propose concept...
Clustering of machines for better spatial reuse has been considered as one key technology supporting machine-to-machine (M2M) communications with a large number communicating devices. Unlike related work that focuses on greedy clustering algorithms without interference control, in this paper we consider scenario where through joint cluster formation and power control are allowed to opportunistically use the spectrum occupied by human devices interference-limited M2M communications. To...
The recent advances in imaging devices have opened the opportunity of better solving tasks video content analysis and understanding. Next-generation cameras, such as depth or binocular capture diverse information, complement conventional 2D RGB cameras. Thus, investigating yielded multimodal videos generally facilitates accomplishment related applications. However, limitations emerging short effective distances, expensive costs, long response time, degrade their applicability, currently make...
Recently, there has been an increasing interest in the deployment and management of femto base stations (BSs) to optimize overall system performance macro-femto heterogeneous networks. While BSs is typically not as planned that pico BSs, given a number be distributed candidate customer sites, questions regarding optimal locations transmission configurations still need answered. In this paper, we formulate joint optimization problem involving location, cell selection, power control maximize...
With the aim at accurate action video retrieval, we firstly present an approach that can infer implicit skeleton structure for a query action, RGB video, and then propose to expand this with inferred improving performance of retrieval. It is inspired by observation structures compactly effectively represent human actions, are helpful in bridging semantic gap The focal point hence on estimation videos. Specifically, iterative training procedure developed select relevant data inferring input...