Srinath Sridhar

ORCID: 0000-0003-4663-3324
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • 3D Shape Modeling and Analysis
  • Human Pose and Action Recognition
  • Advanced Vision and Imaging
  • Computer Graphics and Visualization Techniques
  • Hand Gesture Recognition Systems
  • 3D Surveying and Cultural Heritage
  • Robot Manipulation and Learning
  • Human Motion and Animation
  • Image Processing and 3D Reconstruction
  • Video Surveillance and Tracking Methods
  • Robotics and Sensor-Based Localization
  • Domain Adaptation and Few-Shot Learning
  • Image and Object Detection Techniques
  • Advanced Numerical Analysis Techniques
  • Teleoperation and Haptic Systems
  • Multimodal Machine Learning Applications
  • Augmented Reality Applications
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Neural Network Applications
  • Interactive and Immersive Displays
  • 3D Modeling in Geospatial Applications
  • Music Technology and Sound Studies
  • Muscle activation and electromyography studies
  • Neural Networks and Applications
  • Multimedia Communication and Technology

John Brown University
2020-2025

Brown University
2022

Stanford University
2017-2020

Max Planck Institute for Informatics
2014-2017

R.V. College of Engineering
2017

Max Planck Society
2012-2015

Irvine University
2003

The goal of this paper is to estimate the 6D pose and dimensions unseen object instances in an RGB-D image. Contrary "instance-level'' estimation tasks, our problem assumes that no exact CAD models are available during either training or testing time. To handle different a given category, we introduce Normalized Object Coordinate Space (NOCS)-a shared canonical representation for all possible within category. Our region-based neural network then trained directly infer correspondence from...

10.1109/cvpr.2019.00275 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

We address the highly challenging problem of real-time 3D hand tracking based on a monocular RGB-only sequence. Our method combines convolutional neural network with kinematic model, such that it generalizes well to unseen data, is robust occlusions and varying camera viewpoints, leads anatomically plausible as temporally smooth motions. For training our CNN we propose novel approach for synthetic generation data geometrically consistent image-to-image translation network. To be more...

10.1109/cvpr.2018.00013 article EN 2018-06-01

We propose a new single-shot method for multi-person 3D pose estimation in general scenes from monocular RGB camera. Our approach uses novel occlusion-robust pose-maps (ORPM) which enable full body inference even under strong partial occlusions by other people and objects the scene. ORPM outputs fixed number of maps encode joint locations all Body part associations [8] allow us to infer an arbitrary without explicit bounding box prediction. To train our we introduce MuCo-3DHP, first large...

10.1109/3dv.2018.00024 article EN 2021 International Conference on 3D Vision (3DV) 2018-09-01

Abstract Recent advances in machine learning have led to increased interest solving visual computing problems using methods that employ coordinate‐based neural networks. These methods, which we call fields , parameterize physical properties of scenes or objects across space and time. They seen widespread success such as 3D shape image synthesis, animation human bodies, reconstruction, pose estimation. Rapid progress has numerous papers, but a consolidation the discovered knowledge not yet...

10.1111/cgf.14505 article EN publisher-specific-oa Computer Graphics Forum 2022-05-01

We present an approach for real-time, robust and accurate hand pose estimation from moving egocentric RGB-D cameras in cluttered real environments. Existing methods typically fail hand-object interactions scenes imaged viewpoints, common virtual or augmented reality applications. Our uses two subsequently applied Convolutional Neural Networks (CNNs) to localize the regress 3D joint locations. Hand localization is achieved by using a CNN estimate 2D position of center input, even presence...

10.1109/iccv.2017.131 preprint EN 2017-10-01

Markerless tracking of hands and fingers is a promising enabler for human-computer interaction. However, adoption has been limited because inaccuracies, incomplete coverage motions, low framerate, complex camera setups, high computational requirements. In this paper, we present fast method accurately rapid articulations the hand using single depth camera. Our algorithm uses novel detectionguided optimization strategy that increases robustness speed pose estimation. detection step, randomized...

10.1109/cvpr.2015.7298941 preprint EN 2015-06-01

We introduce HuMoR: a 3D Human Motion Model for Robust Estimation of temporal pose and shape. Though substantial progress has been made in estimating human motion shape from dynamic observations, recovering plausible sequences the presence noise occlusions remains challenge. For this purpose, we propose an expressive generative model form conditional variational autoencoder, which learns distribution change at each step sequence. Furthermore, flexible optimization-based approach that...

10.1109/iccv48922.2021.01129 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

This paper investigates an emerging input method enabled by progress in hand tracking: free motion of fingers. The is expressive, potentially fast, and usable across many settings as it does not insist on physical contact or visual feedback. Our goal to inform the design high-performance methods providing detailed analysis performance anatomical characteristics finger motion. We conducted experiment using a commercially available sensor report speed, accuracy, individuation, movement ranges,...

10.1145/2702123.2702136 article EN 2015-04-17

We present an approach for real-time, robust and accurate hand pose estimation from moving egocentric RGB-D cameras in cluttered real environments. Existing methods typically fail hand-object interactions scenes imaged viewpoints-common virtual or augmented reality applications. Our uses two subsequently applied Convolutional Neural Networks (CNNs) to localize the regress 3D joint locations. Hand localization is achieved by using a CNN estimate 2D position of center input, even presence...

10.1109/iccvw.2017.82 article EN 2017-10-01

This paper contributes a novel sensing approach to support on- and above-skin finger input for interaction on the move. WatchSense uses depth sensor embedded in wearable device expand space neighboring areas of skin above it. Our addresses challenging camera-based tracking conditions, such as oblique viewing angles occlusions. It can accurately detect fingertips, their locations, whether they are touching or hovering extends previous work that supported either mid-air multitouch by...

10.1145/3025453.3026005 article EN 2017-05-02

Humans universally dislike the task of cleaning up a messy room. If machines were to help us with this task, they must understand human criteria for regular arrangements, such as several types symmetry, co-linearity or co-circularity, spacing uniformity in linear circular patterns, and further inter-object relationships that relate style functionality. Previous approaches relied on input explicitly specify goal state, synthesized scenes from scratch - but methods do not address rearrangement...

10.1109/cvpr52729.2023.01825 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Recent works have demonstrated that natural language can be used to generate and edit 3D shapes. However, these methods shapes with limited fidelity diversity. We introduce CLIP-Sculptor, a method address constraints by producing high-fidelity diverse without the need for (text, shape) pairs during training. CLIP-Sculptor achieves this in multi-resolution approach first generates low-dimensional latent space then upscales higher resolution improved shape fidelity. For diversity, we use...

10.1109/cvpr52729.2023.01759 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Augmented reality (AR) in automobiles has the potential to significantly alter driver's user experience. Prototypes developed academia and industry demonstrate a range of applications from advanced driver assist systems location-based information services. A user-centered process for creating evaluating designs AR displays helps explore what collaborative role should serve between technologies automobile driver. In particular, we consider nature this along three important perspectives:...

10.1109/ismar-amh.2013.6671262 article EN 2013-10-01

Real-time marker-less hand tracking is of increasing importance in human-computer interaction. Robust and accurate arbitrary motion a challenging problem due to the many degrees freedom, frequent self-occlusions, fast motions, uniform skin color. In this paper, we propose new approach that tracks full skeleton from multiple RGB cameras real-time. The main contributions include generative method which employs an implicit shape representation based on Sum Anisotropic Gaussians (SAG), pose...

10.1109/3dv.2014.37 preprint EN 2014-12-01

We present ShapeCrafter, a neural network for recursive text-conditioned 3D shape generation. Existing methods to generate shapes consume an entire text prompt in single step. However, humans tend describe recursively-we may start with initial description and progressively add details based on intermediate results. To capture this process, we introduce method distribution, conditioned phrase, that gradually evolves as more phrases are added. Since existing datasets insufficient training...

10.48550/arxiv.2207.09446 preprint EN other-oa arXiv (Cornell University) 2022-01-01

We propose CaSPR, a method to learn object-centric Canonical Spatiotemporal Point Cloud Representations of dynamically moving or evolving objects. Our goal is enable information aggregation over time and the interrogation object state at any spatiotemporal neighborhood in past, observed not. Different from previous work, CaSPR learns representations that support spacetime continuity, are robust variable irregularly spacetime-sampled point clouds, generalize unseen instances. approach divides...

10.48550/arxiv.2008.02792 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Progress in 3D object understanding has relied on manually "canonicalized" shape datasets that contain instances with consistent position and orientation (3D pose). This made it hard to generalize these methods in-the-wild shapes, e.g., from internet model collections or depth sensors. ConDor is a self-supervised method learns Canonicalize the for full partial point clouds. We build top of Tensor Field Networks (TFNs), class permutation- rotation-equivariant, translation-invariant networks....

10.1109/cvpr52688.2022.01646 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

The majority of descriptor-based methods for geometric processing non-rigid shape rely on hand-crafted descriptors. Recently, learning-based techniques have been shown effective, achieving state-of-the-art results in a variety tasks. Yet, even though these can principle work directly raw data, most still handcrafted descriptors at the input layer. In this work, we wish to challenge practice and use neural network learn from mesh. To end, introduce two modules into our architecture. first is...

10.1109/wacv48630.2021.00018 article EN 2021-01-01

Recent advancement in 2D image diffusion models has driven significant progress text-guided texture synthesis, enabling realistic, high-quality generation from arbitrary text prompts. However, current methods usually focus on synthesizing for single static 3D objects, and struggle to handle entire families of shapes, such as those produced by procedural programs. Applying existing naively each shape is too slow support exploring different parameter settings at interactive rates, also results...

10.48550/arxiv.2501.17895 preprint EN arXiv (Cornell University) 2025-01-28

3D Gaussian Splatting (3DGS) has demonstrated superior quality in modeling objects and scenes. However, generating 3DGS remains challenging due to their discrete, unstructured, permutation-invariant nature. In this work, we present a simple yet effective method overcome these challenges. We utilize spherical mapping transform into structured 2D representation, termed UVGS. UVGS can be viewed as multi-channel images, with feature dimensions concatenation of attributes such position, scale,...

10.48550/arxiv.2502.01846 preprint EN arXiv (Cornell University) 2025-02-03

10.1109/wacv61041.2025.00056 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025-02-26
Coming Soon ...