- Advanced Vision and Imaging
- 3D Shape Modeling and Analysis
- Human Pose and Action Recognition
- Computer Graphics and Visualization Techniques
- Face recognition and analysis
- Generative Adversarial Networks and Image Synthesis
- Human Motion and Animation
- Video Surveillance and Tracking Methods
- Anomaly Detection Techniques and Applications
- Hand Gesture Recognition Systems
- Optical measurement and interference techniques
- Virtual Reality Applications and Impacts
- Robotic Path Planning Algorithms
- Artificial Intelligence in Games
- Face and Expression Recognition
- Interactive and Immersive Displays
- 3D Surveying and Cultural Heritage
- Visual Attention and Saliency Detection
- Robotics and Sensor-Based Localization
- Video Analysis and Summarization
- Opportunistic and Delay-Tolerant Networks
- Infant Health and Development
- Voice and Speech Disorders
- Advanced Image Processing Techniques
- Augmented Reality Applications
REALITY Publishing (United States)
2024
META Health
2022-2024
Meta (Israel)
2018-2021
Carnegie Mellon University
2010-2019
Meta (United States)
2019
Walt Disney (United States)
2012
We present an approach to efficiently detect the 2D pose of multiple people in image. The uses a nonparametric representation, which we refer as Part Affinity Fields (PAFs), learn associate body parts with individuals architecture encodes global context, allowing greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance, irrespective number is designed jointly part locations and their association via two branches same sequential prediction process. Our...
Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people images and videos. In this work, we present realtime approach detect the multiple image. The proposed method uses nonparametric representation, which refer as Part Affinity Fields (PAFs), learn associate body parts with individuals This bottom-up system achieves high accuracy performance, regardless number previous PAFs part location were refined simultaneously across training...
We present an approach that uses a multi-camera system to train fine-grained detectors for keypoints are prone occlusion, such as the joints of hand. call this procedure multiview bootstrapping: first, initial keypoint detector is used produce noisy labels in multiple views The detections then triangulated 3D using geometry or marked outliers. Finally, reprojected triangulations new labeled training data improve detector. repeat process, generating more each iteration. derive result...
Recent advances in image-based 3D human shape estimation have been driven by the significant improvement representation power afforded deep neural networks. Although current approaches demonstrated potential real world settings, they still fail to produce reconstructions with level of detail often present input images. We argue that this limitation stems primarily form two conflicting requirements; accurate predictions require large context, but precise high resolution. Due memory...
Modeling and rendering of dynamic scenes is challenging, as natural often contain complex phenomena such thin structures, evolving topology, translucency, scattering, occlusion, biological motion. Mesh-based reconstruction tracking fail in these cases, other approaches (e.g., light field video) typically rely on constrained viewing conditions, which limit interactivity. We circumvent difficulties by presenting a learning-based approach to representing objects inspired the integral projection...
The following topics are dealt with: learning (artificial intelligence); feature extraction; image classification; neural nets; representation; object detection; segmentation; convolution; feedforward video signal processing.
Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people images and videos. In this work, we present realtime approach detect the multiple image. The proposed method uses nonparametric representation, which refer as Part Affinity Fields (PAFs), learn associate body parts with individuals This bottom-up system achieves high accuracy performance, regardless number previous PAFs part location were refined simultaneously across training...
Abstract Efficient rendering of photo‐realistic virtual worlds is a long standing effort computer graphics. Modern graphics techniques have succeeded in synthesizing images from hand‐crafted scene representations. However, the automatic generation shape, materials, lighting, and other aspects scenes remains challenging problem that, if solved, would make more widely accessible. Concurrently, progress vision machine learning given rise to new approach image synthesis editing, namely deep...
We present an approach to capture the 3D motion of a group people engaged in social interaction. The core challenges capturing interactions are: (1) occlusion is functional and frequent; (2) subtle needs be measured over space large enough host group; (3) human appearance configuration variation immense; (4) attaching markers body may prime nature interactions. Panoptic Studio system organized around thesis that should through integration perceptual analyses variety view points. modularized...
We introduce a deep appearance model for rendering the human face. Inspired by Active Appearance Models, we develop data-driven pipeline that learns joint representation of facial geometry and from multiview capture setup. Vertex positions view-specific textures are modeled using variational autoencoder captures complex nonlinear effects while producing smooth compact latent representation. View-specific texture enables modeling view-dependent such as specularity. In addition, it can also...
Abstract Synthesizing photo‐realistic images and videos is at the heart of computer graphics has been focus decades research. Traditionally, synthetic a scene are generated using rendering algorithms such as rasterization or ray tracing, which take specifically defined representations geometry material properties input. Collectively, these inputs define actual what rendered, referred to representation (where consists one more objects). Example triangle meshes with accompanied textures (e.g.,...
Real-time rendering and animation of humans is a core function in games, movies, telepresence applications. Existing methods have number drawbacks we aim to address with our work. Triangle meshes difficulty modeling thin structures like hair, volumetric representations Neural Volumes are too low-resolution given reasonable memory budget, high-resolution implicit Radiance Fields slow for use real-time We present Mixture Volumetric Primitives (MVP), representation dynamic 3D content that...
Accurate estimation of 3D human motion from monocular video requires modeling both kinematics (body without physical forces) and dynamics (motion with forces). To demonstrate this, we present SimPoE, a Simulation-based approach for Pose Estimation, which integrates image-based kinematic inference physics-based modeling. SimPoE learns policy that takes as input the current-frame pose estimate next image frame to control physically-simulated character output next-frame estimate. The contains...
Creating photorealistic avatars of existing people currently requires extensive person-specific data capture, which is usually only accessible to the VFX industry and not general public. Our work aims address this drawback by relying on a short mobile phone capture obtain drivable 3D head avatar that matches person's likeness faithfully. In contrast approaches, our architecture avoids complex task directly modeling entire manifold human appearance, aiming instead generate an model can be...
Photo-editing software restricts the control of objects in a photograph to 2D image plane. We present method that enables users perform full range 3D manipulations, including scaling, rotation, translation, and nonrigid deformations, an object photograph. As manipulations often reveal parts are hidden original photograph, our approach uses publicly available models guide completion geometry appearance revealed areas object. The process leverages structure symmetry stock model factor out...
A key promise of Virtual Reality (VR) is the possibility remote social interaction that more immersive than any prior telecommunication media. However, existing VR experiences are mediated by inauthentic digital representations user (i.e., stylized avatars). These have limited adoption applications in precisely those cases where immersion most necessary (e.g., professional interactions and intimate conversations). In this work, we present a bidirectional system can animate avatar heads both...
A variety of dynamic objects, such as faces, bodies, and cloth, are represented in computer graphics a collection moving spatial landmarks. Spatiotemporal data is inherent number applications including animation, simulation, object camera tracking. The principal modes variation the geometry objects typically modeled using dimensionality reduction techniques, while concurrently, trajectory representations like splines autoregressive models widely used to exploit temporal regularity...
We present an approach to efficiently detect the 2D pose of multiple people in image. The uses a nonparametric representation, which we refer as Part Affinity Fields (PAFs), learn associate body parts with individuals architecture encodes global context, allowing greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance, irrespective number is designed jointly part locations and their association via two branches same sequential prediction process. Our...
Photorealistic rendering of dynamic humans is an important capability for telepresence systems, virtual shopping, special effects in movies, and interactive experiences such as games. Recently, neural methods have been developed to create high-fidelity models objects. Some these do not produce results with high-enough fidelity driveable human (Neural Volumes) whereas others extremely long times (NeRF). We propose a novel compositional 3D representation that combines the best previous both...
Telecommunication with photorealistic avatars in virtual or augmented reality is a promising path for achieving authentic face-to-face communication 3D over remote physical distances. In this work, we present the Pixel Codec Avatars (PiCA): deep generative model of human faces that achieves state art reconstruction performance while being computationally efficient and adaptive to rendering conditions during execution. Our combines two core ideas: (1) fully convolutional architecture decoding...
We present a method for building high-fidelity animatable 3D face models that can be posed and rendered with novel lighting environments in real-time. Our main insight is relightable trained to produce an image lit from single light direction generalize natural illumination conditions but are computationally expensive render. On the other hand, efficient, point-light data do not conditions. leverage strengths of each these two approaches. first train generalizable model on illuminations, use...
We present a learning-based method for building driving-signal aware full-body avatars. Our model is conditional variational autoencoder that can be animated with incomplete driving signals, such as human pose and facial keypoints, produces high-quality representation of geometry view-dependent appearance. The core intuition behind our better drivability generalization achieved by disentangling the signals remaining generative factors, which are not available during animation. To this end,...