- Advanced Vision and Imaging
- Robotics and Sensor-Based Localization
- Advanced Image and Video Retrieval Techniques
- Optical measurement and interference techniques
- Advanced Neural Network Applications
- Computer Graphics and Visualization Techniques
- Advanced Image Processing Techniques
- 3D Surveying and Cultural Heritage
- Image Processing Techniques and Applications
- Video Surveillance and Tracking Methods
- Image and Object Detection Techniques
- Face recognition and analysis
- Remote Sensing and LiDAR Applications
- Image Enhancement Techniques
- Human Pose and Action Recognition
- Robotic Path Planning Algorithms
- Biometric Identification and Security
- Multimodal Machine Learning Applications
- 3D Shape Modeling and Analysis
- Infrared Target Detection Methodologies
- Robot Manipulation and Learning
- Indoor and Outdoor Localization Technologies
- Medical Image Segmentation Techniques
- Domain Adaptation and Few-Shot Learning
- Anatomy and Medical Technology
Indian Institute of Technology Kanpur
2025
Microsoft Research (United Kingdom)
2012-2024
Microsoft (United States)
2012-2023
Central Forensic Science Laboratory
2023
University of North Carolina at Chapel Hill
2003-2009
University of North Carolina Health Care
2007-2009
We propose a single-shot approach for simultaneously detecting an object in RGB image and predicting its 6D pose without requiring multiple stages or having to examine hypotheses. Unlike recently proposed technique this task [10] that only predicts approximate must then be refined, ours is accurate enough not require additional post-processing. As result, it much faster - 50 fps on Titan X (Pascal) GPU more suitable real-time processing. The key component of our method new CNN architecture...
Video analytics will drive a wide range of applications with great potential to impact society. A geographically distributed architecture public clouds and edges that extend down the cameras is only feasible approach meeting strict real-time requirements large-scale live video analytics.
We present a novel multi-view stereo method designed for image-based rendering that generates piecewise planar depth maps from an unordered collection of photographs.
We present an interactive system for generating photorealistic, textured, piecewise-planar 3D models of architectural structures and urban scenes from unordered sets photographs. To reconstruct geometry in our system, the user draws outlines overlaid on 2D The structure is then automatically computed by combining interaction with multi-view geometric information recovered performing motion analysis input utilize vanishing point constraints at multiple stages during reconstruction, which...
The paper introduces a data collection system and processing pipeline for automatic geo-registered 3D reconstruction of urban scenes from video. collects multiple video streams, as well GPS INS measurements in order to place the reconstructed models geo- registered coordinates. Besides high quality terms both geometry appearance, we aim at real-time performance. Even though our is currently far being real-time, select techniques design modules that can achieve fast performance on CPUs GPUs...
This paper presents a method for joint stereo matching and object segmentation. In our approach 3D scene is represented as collection of visually distinct spatially coherent objects. Each characterized by three different aspects: color model, plane that approximates the object's disparity distribution, novel connectivity property. Inspired Markov Random Field models image segmentation, we employ object-level soft constraint, which can aid depth estimation in powerful ways. particular, able...
Drones equipped with cameras are emerging as a powerful tool for large-scale aerial 3D scanning, but existing automatic flight planners do not exploit all available information about the scene, and can therefore produce inaccurate incomplete models. We present an method to generate drone trajectories, such that imagery acquired during will later high-fidelity model. Our uses coarse estimate of scene geometry plan camera trajectories that: (1) cover thoroughly possible; (2) encourage...
We present an approach to synthesize highly photorealistic images of 3D object models, which we use train a convolutional neural network for detecting the objects in real images. The proposed has three key ingredients: (1) models are rendered complete scenes with realistic materials and lighting, (2) plausible geometric configuration cameras scene is generated using physics simulation, (3) high photorealism synthesized achieved by physically based rendering. When trained on approach, Faster...
We propose a new technique to jointly recover cosegmentation and dense per-pixel correspondence in two images. Our method parameterizes the field using piecewise similarity transformations recovers mapping between estimated common "foreground" regions images allowing them be precisely aligned. formulation is based on hierarchical Markov random model with segmentation transformation labels. The structure uses nested image constrain inference across multiple scales. Unlike prior methods which...
We present a system for image-based modeling and rendering of real-world scenes containing reflective glossy surfaces. Previous approaches to assume that the scene can be approximated by 3D proxies enable view interpolation using traditional back-to-front or z-buffer compositing. In this work, we show how these generalized multiple layers are combined in an additive fashion model reflection transmission light occurs at specular surfaces such as glass materials. To simplify analysis stages,...
Many 3D vision systems localize cameras within a scene using point clouds. Such clouds are often obtained structure from motion (SfM), after which the images discarded to preserve privacy. In this paper, we show, for first time, that such retain enough information reveal appearance and compromise We present privacy attack reconstructs color of cloud. Our method is based on cascaded U-Net takes as input, 2D multichannel image points rendered specific viewpoint containing depth optionally SIFT...
We present a stereo algorithm designed for speed and efficiency that uses local slanted plane sweeps to propose disparity hypotheses semi-global matching algorithm. Our are derived from initial sparse feature correspondences followed by an iterative clustering step. Local then performed around each produce out-of-plane parallax matching-cost estimates. A final global optimization stage, implemented using matching, assigns pixel one of the hypotheses. By only exploring small fraction whole...
We formulate multi-view 3D shape reconstruction as the computation of a minimum cut on dual graph semi- regular, multi-resolution, tetrahedral mesh. Our method does not assume that surface lies within finite band around visual hull or any other base surface. Instead, it uses photo-consistency to guide adaptive subdivision coarse mesh bounding volume. This generates multi-resolution volumetric is densely tesselated in parts likely contain unknown The graph-cut this produces corresponding...
This paper describes a novel approach for reconstructing closed continuous surface of an object from multiple calibrated color images and silhouettes. Any accurate reconstruction must satisfy (1) photo-consistency (2) silhouette consistency constraints. Most existing techniques treat these cues identically in optimization frameworks where constraints are traded off against smoothness priors. Our strictly enforces constraints, while optimizing global graph-cut framework. We transform the...
Most existing structure from motion (SFM) approaches for unordered images cannot handle multiple instances of the same in scene. When image pairs containing different are matched based on visual similarity, pairwise geometric relations as well correspondences inferred such erroneous, which can lead to catastrophic failures reconstruction. In this paper, we investigate ambiguities caused by presence repeated or duplicate structures and show that disambiguate between hypotheses requires more...
Digitally unwrapping images of paper sheets is crucial for accurate document scanning and text recognition. This presents a method automatically rectifying curved or folded from few captured multiple viewpoints. Prior methods either need expensive 3D scanners model deformable surfaces using over-simplified parametric representations. In contrast, our uses regular based on general developable surface models that can represent wide variety deformations. Our main contribution new robust...
Image-based localization is a core component of many augmented/mixed reality (AR/MR) and autonomous robotic systems. Current systems rely on the persistent storage 3D point clouds scene to enable camera pose estimation, but such data reveals potentially sensitive information. This gives rise significant privacy risks, especially as for applications mapping background process that user might not be fully aware of. We following question: How can we avoid disclosing confidential information...
In this paper, we propose a novel method to recover the 3D trajectory of moving person from monocular camera mounted on quadrotor micro aerial vehicle (MAV). The key contribution is an integrated approach that simultaneously performs visual odometry (VO) and persistent tracking automatically detected in scene. All computation pertaining VO, detection runs onboard MAV front-facing RGB camera. Given gravity direction inertial sensor knowledge individual's height, complete within reconstructed...
The risk of unauthorized remote access streaming video from networked cameras underlines the need for stronger privacy safeguards. We propose a lens-free coded aperture camera system human action recognition that is privacy-preserving. While systems exist, we believe ours first designed without image restoration as an intermediate step. Action done using deep network takes in input, non-invertible motion features between pairs frames computed phase correlation and log-polar transformation....
In this paper we present an automatic method for calibrating a network of cameras from only silhouettes. This is particularly useful shape-from-silhouette or visual-hull systems, as no additional data needed calibration. The key novel contribution work algorithm to robustly compute the epipolar geometry dynamic We use fundamental matrices computed by determine projective reconstruction complete camera configuration. refined into metric using self-calibration. validate our approach four...