- Robotics and Sensor-Based Localization
- Advanced Vision and Imaging
- Advanced Image and Video Retrieval Techniques
- Optical measurement and interference techniques
- 3D Shape Modeling and Analysis
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Computer Graphics and Visualization Techniques
- Advanced Neural Network Applications
- Sparse and Compressive Sensing Techniques
- Image and Object Detection Techniques
- Advanced Image Processing Techniques
- Retinal and Macular Surgery
- Image Retrieval and Classification Techniques
- Image Processing Techniques and Applications
- Indoor and Outdoor Localization Technologies
- Generative Adversarial Networks and Image Synthesis
- Augmented Reality Applications
- Industrial Vision Systems and Defect Detection
- Advanced Memory and Neural Computing
- Retinal Imaging and Analysis
- Topic Modeling
- Video Analysis and Summarization
- Visual Attention and Saliency Detection
- Human Motion and Animation
ETH Zurich
2017-2024
Institute for Social and Environmental Research-Nepal
2024
Institut Pascal
2016-2017
Université Clermont Auvergne
2014-2016
Universidad Blas Pascal
2016
Centre National de la Recherche Scientifique
2014
HES-SO University of Applied Sciences and Arts Western Switzerland
2013
Université de Bourgogne
2013
Abstract This paper tackles the high computational/space complexity associated with multi-head self-attention (MHSA) in vanilla vision transformers. To this end, we propose hierarchical MHSA (H-MHSA), a novel approach that computes sell-attention fashion. Specifically, first divide input image into patches as commonly done, and each patch is viewed token. Then, proposed H-MHSA learns token relationships within local patches, serving relationship modeling. small are merged larger ones, models...
This paper proposes a general framework to solve Non-Rigid Shape-from-Motion (NRSfM) with the perspective camera under isometric deformations. Contrary usual low-rank linear shape basis, isometry allows us recover complex deformations from sparse set of images. Existing methods suffer ambiguities and may be very expensive solve. We bring four main contributions. First, we formulate NRSfM as system first-order Partial Differential Equations (PDE) involving shape’s depth normal field an...
Computer vision and robotics are being increasingly applied in medical interventions. Especially interventions where extreme precision is required, they could make a difference. One such application robot-assisted retinal microsurgery. In recent works, conducted under stereo-microscope, with robot-controlled surgical tool. The complementarity of computer has, however, not yet been fully exploited. order to improve the robot control, we interested three-dimensional (3-D) reconstruction...
Shape-from-Template (SfT) reconstructs the shape of a deforming surface from single image, 3D template and deformation prior. For isometric deformations, this is well-posed problem. However, previous methods which require no initialization break down when perspective effects are small, happens object small or viewed larger distances. That is, they do not handle all projection geometries. We propose stable SfT that accurately reconstruct for follow existing approach using first-order...
Open compound domain adaptation (OCDA) is a setting, where target modeled as of multiple unknown homogeneous domains, which brings the advantage improved generalization to unseen domains. In this work, we propose principled meta-learning based approach OCDA for semantic segmentation, MOCDA, by modeling unlabeled continuously. Our consists four key steps. First, cluster into sub-target domains image styles, extracted in an unsupervised manner. Then, different are split independent branches,...
Building on progress in feature representations for image retrieval, image-based localization has seen a surge of research interest. Image-based the advantage being inexpensive and efficient, often avoiding use 3D metric maps altogether. That said, need to maintain large amount reference images as an effective support scene, nonetheless calls them be organized map structure some kind. The problem arises part navigation process. We are, therefore, interested summarizing set landmarks, which...
It has been recently shown that reconstructing an isometric surface from a single 2D input image matched to 3D template was well-posed problem. This however does not tell us how reconstruction algorithms will behave in practical conditions, where the amount of perspective is generally small and projection thus behaves like weak-perspective or orthography. We here bring answers what theoretically recoverable such imaging explain why existing convex numerical solutions analytical may be...
We present a global and convex formulation for the template-less 3D reconstruction of deforming object with perspective camera. show first time how to construct Second-Order Cone Programming (SOCP) problem Non-Rigid Structure-from-Motion (NRSfM) using Maximum-Depth Heuristic (MDH). In this regard, we deviate strongly from general trend affine cameras factorization-based methods solve NRSfM, which do not perform well complex nonlinear deformations. MDH, points' depths are maximized so that...
We present a global and convex formulation for template-less 3D reconstruction of deforming object with the perspective camera. show first time how to construct Second-Order Cone Programming (SOCP) problem Non-Rigid Shape-from-Motion (NRSfM) using Maximum-Depth Heuristic (MDH). In this regard, we deviate strongly from general trend affine cameras factorization-based methods solve NRSfM. MDH, points' depths are maximized so that distance between neighbouring points in camera space upper...
Consensus maximization is a key strategy in 3D vision for robust geometric model estimation from measurements with outliers. Generic methods consensus maximization, such as Random Sampling and (RANSAC), have played tremendous role the success of vision, spite ubiquity However, replicating same generic behaviour deeply learned architecture, using supervised approaches, has proven to be difficult. In that context, unsupervised huge potential adapt any unseen data distribution, therefore are...
Journal images represent an important part of the knowledge stored in medical literature. Figure classification has received much attention as information image types can be used a variety contexts to focus search and filter out unwanted or "noise", for example non–clinical images. A major problem figure is fact that many figures biomedical literature are compound do often contain more than single type. Some journals separate into several parts but not, thus requiring currently manual...
Efficient detection and description of geometric regions in images is a prerequisite visual systems for localization mapping. Such still rely on traditional handcrafted methods efficient generation lightweight descriptors, common limitation the more powerful neural network models that come with high compute specific hardware requirements. In this paper, we focus adaptations required by networks to enable their use computationally limited platforms such as robots, mobile, augmented reality...
Modeling Neural Radiance Fields for fast-moving deformable objects from visual data alone is a challenging problem. A major issue arises due to the high deformation and low acquisition rates. To address this problem, we propose use event cameras that offer very fast of change in an asynchronous manner. In work, develop novel method model neural radiance fields using RGB cameras. The proposed uses stream events calibrated sparse frames. our setup, camera pose at individual –required integrate...
Generative adversarial networks (GANs) have shown impressive results in both unconditional and conditional image generation. In recent literature, it is that pre-trained GANs, on a different dataset, can be transferred to improve the generation from small target data. The same, however, has not been well-studied case of GANs (cGANs), which provides new opportunities for knowledge transfer compared setup. particular, classes may borrow related old classes, or share among themselves training....
This paper tackles the high computational/space complexity associated with Multi-Head Self-Attention (MHSA) in vanilla vision transformers. To this end, we propose Hierarchical MHSA (H-MHSA), a novel approach that computes self-attention hierarchical fashion. Specifically, first divide input image into patches as commonly done, and each patch is viewed token. Then, proposed H-MHSA learns token relationships within local patches, serving relationship modeling. small are merged larger ones,...
Augmented Reality (AR) can improve the information delivery to surgeons. In laparosurgery, primary goal of AR is provide multimodal overlaid in live laparoscopic videos. For gynecologic laparoscopy, 3D reconstruction uterus and its deformable registration preoperative data form major problems AR. Shape-from-Shading (SfS) inter-frame require an accurate identification region, occlusions due surgical tools, specularities, other tissues. We propose a cascaded patient-specific real-time...
In this paper, we formulate a generic non-minimal solver using the existing tools of Polynomials Optimization Problems (POP) from computational algebraic geometry. The proposed method exploits well known Shor's or Lasserre's relaxations, whose theoretical aspects are also discussed. Notably, further exploit POP formulation for consensus maximization problems in 3D vision. Our framework is simple and straightforward to implement, which supported by three diverse applications vision, namely...
Abstract Unsupervised template discovery via implicit representation in a category of shapes has recently shown strong performance. At the core, such methods deform input to common space which allows establishing correspondences as well shapes. In this work we investigate inherent assumption that neural field optimization naturally leads consistently warped shapes, thus providing both good shape reconstruction and correspondences. Contrary convenient assumption, practice observe is not case,...