- Advanced Vision and Imaging
- Human Pose and Action Recognition
- Video Surveillance and Tracking Methods
- Face recognition and analysis
- 3D Shape Modeling and Analysis
- Computer Graphics and Visualization Techniques
- Robotics and Sensor-Based Localization
- Generative Adversarial Networks and Image Synthesis
- Human Motion and Animation
- Video Analysis and Summarization
- Optical measurement and interference techniques
- Advanced Image and Video Retrieval Techniques
- Hand Gesture Recognition Systems
- Anomaly Detection Techniques and Applications
- Speech and Audio Processing
- Advanced Image Processing Techniques
- Visual Attention and Saliency Detection
- Gait Recognition and Analysis
- 3D Surveying and Cultural Heritage
- Gaze Tracking and Assistive Technology
- Facial Nerve Paralysis Treatment and Research
- Advanced Neural Network Applications
- Image Enhancement Techniques
- Music and Audio Processing
- Virtual Reality Applications and Impacts
META Health
2022-2024
REALITY Publishing (United States)
2024
Carnegie Mellon University
2013-2023
Washington University in St. Louis
2023
University of Kentucky
2023
The University of Adelaide
2023
Meta (Israel)
2018-2021
Meta (United States)
2018-2021
University of Central Florida
2003-2006
We present an approach to efficiently detect the 2D pose of multiple people in image. The uses a nonparametric representation, which we refer as Part Affinity Fields (PAFs), learn associate body parts with individuals architecture encodes global context, allowing greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance, irrespective number is designed jointly part locations and their association via two branches same sequential prediction process. Our...
Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people images and videos. In this work, we present realtime approach detect the multiple image. The proposed method uses nonparametric representation, which refer as Part Affinity Fields (PAFs), learn associate body parts with individuals This bottom-up system achieves high accuracy performance, regardless number previous PAFs part location were refined simultaneously across training...
Pose Machines provide a sequential prediction framework for learning rich implicit spatial models. In this work we show systematic design how convolutional networks can be incorporated into the pose machine image features and image-dependent models task of estimation. The contribution paper is to implicitly model long-range dependencies between variables in structured tasks such as articulated We achieve by designing architecture composed that directly operate on belief maps from previous...
We present an approach that uses a multi-camera system to train fine-grained detectors for keypoints are prone occlusion, such as the joints of hand. call this procedure multiview bootstrapping: first, initial keypoint detector is used produce noisy labels in multiple views The detections then triangulated 3D using geometry or marked outliers. Finally, reprojected triangulations new labeled training data improve detector. repeat process, generating more each iteration. derive result...
Accurate detection of moving objects is an important precursor to stable tracking or recognition. In this paper, we present object scheme that has three innovations over existing approaches. First, the model intensities image pixels as independent random variables challenged and it asserted useful correlation exists in spatially proximal pixels. This exploited sustain high levels accuracy presence dynamic backgrounds. By using a nonparametric density estimation method joint domain-range...
Modeling and rendering of dynamic scenes is challenging, as natural often contain complex phenomena such thin structures, evolving topology, translucency, scattering, occlusion, biological motion. Mesh-based reconstruction tracking fail in these cases, other approaches (e.g., light field video) typically rely on constrained viewing conditions, which limit interactivity. We circumvent difficulties by presenting a learning-based approach to representing objects inspired the integral projection...
The following topics are dealt with: learning (artificial intelligence); feature extraction; image classification; neural nets; representation; object detection; segmentation; convolution; feedforward video signal processing.
We present an approach to capture the 3D structure and motion of a group people engaged in social interaction. The core challenges capturing interactions are: (1) occlusion is functional frequent, (2) subtle needs be measured over space large enough host group, (3) human appearance configuration variation immense. Panoptic Studio system organized around thesis that should through perceptual integration variety view points. modularized designed this principle, consisting integrated...
Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people images and videos. In this work, we present realtime approach detect the multiple image. The proposed method uses nonparametric representation, which refer as Part Affinity Fields (PAFs), learn associate body parts with individuals This bottom-up system achieves high accuracy performance, regardless number previous PAFs part location were refined simultaneously across training...
We present the first method to capture 3D total motion of a target person from monocular view input. Given an image or video, our reconstructs body, face, and fingers represented by deformable mesh model. use efficient representation called Part Orientation Fields (POFs), encode orientations all body parts in common 2D space. POFs are predicted Fully Convolutional Network, along with joint confidence maps. To train network, we collect new human dataset capturing diverse 40 subjects multiview...
We present an approach to capture the 3D motion of a group people engaged in social interaction. The core challenges capturing interactions are: (1) occlusion is functional and frequent; (2) subtle needs be measured over space large enough host group; (3) human appearance configuration variation immense; (4) attaching markers body may prime nature interactions. Panoptic Studio system organized around thesis that should through integration perceptual analyses variety view points. modularized...
Background subtraction algorithms define the background as parts of a scene that are at rest. Traditionally, these assume stationary camera, and identify moving objects by detecting areas in video change over time. In this paper, we extend concept `subtracting' rest to apply captured from freely camera. We do not is well-approximated plane or camera center remains during motion. The method operates entirely using 2D image measurements without requiring an explicit 3D reconstruction scene. A...
Pose Machines provide a sequential prediction framework for learning rich implicit spatial models. In this work we show systematic design how convolutional networks can be incorporated into the pose machine image features and image-dependent models task of estimation. The contribution paper is to implicitly model long-range dependencies between variables in structured tasks such as articulated We achieve by designing architecture composed that directly operate on belief maps from previous...
We introduce a deep appearance model for rendering the human face. Inspired by Active Appearance Models, we develop data-driven pipeline that learns joint representation of facial geometry and from multiview capture setup. Vertex positions view-specific textures are modeled using variational autoencoder captures complex nonlinear effects while producing smooth compact latent representation. View-specific texture enables modeling view-dependent such as specularity. In addition, it can also...
In this paper, we present supervision-by-registration, an unsupervised approach to improve the precision of facial landmark detectors on both images and video. Our key observation is that detections same in adjacent frames should be coherent with registration, i.e., optical flow. Interestingly, coherency flow a source supervision does not require manual labeling, can leveraged during detector training. For example, enforce training loss function detected at frame <sub...
Real-time rendering and animation of humans is a core function in games, movies, telepresence applications. Existing methods have number drawbacks we aim to address with our work. Triangle meshes difficulty modeling thin structures like hair, volumetric representations Neural Volumes are too low-resolution given reasonable memory budget, high-resolution implicit Radiance Fields slow for use real-time We present Mixture Volumetric Primitives (MVP), representation dynamic 3D content that...
This paper presents a generic method for generating full facial 3D animation from speech. Existing approaches to audio-driven exhibit uncanny or static upper face animation, fail produce accurate and plausible co-articulation rely on person-specific models that limit their scalability. To improve upon existing models, we propose approach achieves highly realistic motion synthesis results the entire face. At core of our is categorical latent space disentangles audio-correlated...
Creating photorealistic avatars of existing people currently requires extensive person-specific data capture, which is usually only accessible to the VFX industry and not general public. Our work aims address this drawback by relying on a short mobile phone capture obtain drivable 3D head avatar that matches person's likeness faithfully. In contrast approaches, our architecture avoids complex task directly modeling entire manifold human appearance, aiming instead generate an model can be...
Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is linear combination basis shapes. These bases are dependent and therefore have be estimated anew for each video sequence. In contrast, we propose dual approach describe evolving in trajectory space by trajectories. We relationship between two approaches, showing they both equal power representing structure. further show temporal smoothness trajectories alone can used...
Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is linear combination basis shapes, which have be estimated anew for each video sequence. In contrast, we propose evolving described by trajectories. The principal advantage this approach do not need estimate any vectors during computation. We show generic bases over trajectories, such as Discrete Cosine Transform (DCT) basis, can used compactly describe most real motions. This...
One of the fundamental challenges recognizing actions is accounting for variability that arises when arbitrary cameras capture humans performing actions. In this paper, we explicitly identify three important sources variability: (1) viewpoint, (2) execution rate, and (3) anthropometry actors, propose a model human allows us to investigate all three. Our hypothesis associated with an action can be closely approximated by linear combination bases in joint spatio-temporal space. We demonstrate...