- Robotics and Sensor-Based Localization
- Advanced Vision and Imaging
- Advanced Image and Video Retrieval Techniques
- Optical measurement and interference techniques
- Advanced Neural Network Applications
- 3D Shape Modeling and Analysis
- Multimodal Machine Learning Applications
- 3D Surveying and Cultural Heritage
- Domain Adaptation and Few-Shot Learning
- Computer Graphics and Visualization Techniques
- Advanced Image Processing Techniques
- Human Pose and Action Recognition
- Face recognition and analysis
- Generative Adversarial Networks and Image Synthesis
- Digital Media Forensic Detection
- Infrastructure Maintenance and Monitoring
- 3D Modeling in Geospatial Applications
- Video Surveillance and Tracking Methods
- Image Processing and 3D Reconstruction
- Image and Object Detection Techniques
- Image Processing Techniques and Applications
- Image Retrieval and Classification Techniques
- Remote Sensing and LiDAR Applications
- Adversarial Robustness in Machine Learning
- Geological Modeling and Analysis
META Health
2022-2023
Meta (Israel)
2020-2022
University of Hong Kong
1999-2021
Hong Kong University of Science and Technology
1999-2021
Deep learning has recently demonstrated its excellent performance for multi-view stereo (MVS). However, one major limitation of current learned MVS approaches is the scalability: memory-consuming cost volume regularization makes hard to be applied high-resolution scenes. In this paper, we introduce a scalable framework based on recurrent neural network. Instead regularizing entire 3D in go, proposed Recurrent Multi-view Stereo Network (R-MVSNet) sequentially regularizes 2D maps along depth...
Establishing correspondences between two images requires both local and global spatial context. Given putative of feature points in views, this paper, we propose Order-Aware Network, which infers the probabilities being inliers regresses relative pose encoded by essential matrix. Specifically, proposed network is built hierarchically comprises three novel operations. First, to capture context sparse correspondences, clusters unordered input learning a soft assignment These are canonical...
Most existing studies on learning local features focus the patch-based descriptions of individual keypoints, whereas neglecting spatial relations established from their keypoint locations. In this paper, we go beyond detail representation by introducing context awareness to augment off-the-shelf feature descriptors. Specifically, propose a unified framework that leverages and aggregates cross-modality contextual information, including (i) visual high-level image representation, (ii)...
Global Structure-from-Motion (SfM) techniques have demonstrated superior efficiency and accuracy than the conventional incremental approach in many recent studies. This work proposes a divide-and-conquer framework to solve very large global SfM at scale of millions images. Specifically, we first divide all images into multiple partitions that preserve strong data association for well-posed parallel local motion averaging. Then, averaging determines cameras partition boundaries similarity...
In this paper, we present a joint multi-task learning framework for semantic segmentation and boundary detection. The critical component in the is iterative pyramid context module (PCM), which couples two tasks stores shared latent semantics to interact between tasks. For detection, propose novel spatial gradient fusion suppress non-semantic edges. As detection dual task of segmentation, introduce loss function with consistency constraint improve pixel accuracy segmentation. Our extensive...
Accurate relative pose is one of the key components in visual odometry (VO) and simultaneous localization mapping (SLAM). Recently, self-supervised learning framework that jointly optimizes target image depth has attracted attention community. Previous works rely on photometric error generated from depths poses between adjacent frames, which contains large systematic under realistic scenes due to reflective surfaces occlusions. In this paper, we bridge gap geometric loss by introducing...
Temporal camera relocalization estimates the pose with respect to each video frame in sequence, as opposed one-shot which focuses on a still image. Even though time dependency has been taken into account, current temporal methods generally underperform state-of-the-art approaches terms of accuracy. In this work, we improve method by using network architecture that incorporates Kalman filtering (KFNet) for online relocalization. particular, KFNet extends scene coordinate regression problem...
The power of modern image matching approaches is still fundamentally limited by the abrupt scale changes in images. In this paper, we propose a scale-invariant approach to tackling very large variation views. Drawing inspiration from space theory, start with encoding image's into compact multi-scale representation. Then, rather than trying find exact feature matches all one step, progressive two-stage approach. First, determine related levels space, enclosing inlier correspondences, based on...
In this paper, we tackle the accurate and consistent Structure from Motion (SfM) problem, in particular camera registration, far exceeding memory of a single computer parallel. Different previous methods which drastically simplify parameters SfM sacrifice accuracy final reconstruction, try to preserve connectivities among cameras by proposing clustering algorithm divide large problem into smaller sub-problems terms clusters with overlapping. We then exploit hybrid formulation that applies...
Establishing correspondences between two images requires both local and global spatial context. Given putative of feature points in views, this paper, we propose Order-Aware Network, which infers the probabilities being inliers regresses relative pose encoded by essential matrix. Specifically, proposed network is built hierarchically comprises three novel operations. First, to capture context sparse correspondences, clusters unordered input learning a soft assignment These are canonical...
In the light of recent analyses on privacy-concerning scene revelation from visual descriptors, we develop descriptors that conceal input image content. particular, propose an adversarial learning framework for training prevent reconstruction, while maintaining matching accuracy. We let a feature encoding network and reconstruction compete with each other, such encoder tries to impede its generated reconstructor recover descriptors. The experimental results demonstrate obtained our method...
Establishing correct correspondences between two images should consider both local and global spatial context. Given putative of feature points in views, this paper, we propose Order-Aware Network, which infers the probabilities being inliers regresses relative pose encoded by essential or fundamental matrix. Specifically, proposed network is built hierarchically comprises three operations. First, to capture context sparse correspondences, clusters unordered input learning a soft assignment...
The increasing scale of Structure-from-Motion is fundamentally limited by the conventional optimization framework for all-in-one global bundle adjustment. In this paper, we propose a distributed approach to coping with adjustment very large computation. First, derive formulation from classical algorithm ADMM, Alternating Direction Method Multipliers, based on camera consensus. Then, analyze conditions under which convergence would be guaranteed. particular, adopt over-relaxation and...
The self-supervised learning of depth and pose from monocular sequences provides an attractive solution by using the photometric consistency nearby frames as it depends much less on ground-truth data. In this paper, we address issue when previous assumptions approaches are violated due to dynamic nature real-world scenes. Different handling noise uncertainty, our key idea is incorporate more robust geometric quantities enforce internal in temporal image sequence. As demonstrated commonly...
Deep learning has recently demonstrated its excellent performance for multi-view stereo (MVS). However, one major limitation of current learned MVS approaches is the scalability: memory-consuming cost volume regularization makes hard to be applied high-resolution scenes. In this paper, we introduce a scalable framework based on recurrent neural network. Instead regularizing entire 3D in go, proposed Recurrent Multi-view Stereo Network (R-MVSNet) sequentially regularizes 2D maps along depth...
Most existing studies on learning local features focus the patch-based descriptions of individual keypoints, whereas neglecting spatial relations established from their keypoint locations. In this paper, we go beyond detail representation by introducing context awareness to augment off-the-shelf feature descriptors. Specifically, propose a unified framework that leverages and aggregates cross-modality contextual information, including (i) visual high-level image representation, (ii)...
We present a convolutional network architecture for direct feature learning on mesh surfaces through their atlases of texture maps. The map encodes the parameterization from 3D to 2D domain, rendering not only RGB values but also rasterized geometric features if necessary. Since is pre-determined, and depends surface topologies, we therefore introduce novel cross-atlas convolution recover original geodesic neighborhood, so as achieve invariance property arbitrary parameterization. proposed...
In this paper, we present a joint multi-task learning framework for semantic segmentation and boundary detection. The critical component in the is iterative pyramid context module (PCM), which couples two tasks stores shared latent semantics to interact between tasks. For detection, propose novel spatial gradient fusion suppress nonsemantic edges. As detection dual task of segmentation, introduce loss function with consistency constraint improve pixel accuracy segmentation. Our extensive...