Shuzhe Wang

ORCID: 0000-0003-1281-4370
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Robotics and Sensor-Based Localization
  • Advanced Vision and Imaging
  • Advanced Image and Video Retrieval Techniques
  • Image Processing Techniques and Applications
  • Domain Adaptation and Few-Shot Learning
  • 3D Surveying and Cultural Heritage
  • Concrete and Cement Materials Research
  • Human Pose and Action Recognition
  • Multimodal Machine Learning Applications
  • Neural Networks and Applications
  • Computer Graphics and Visualization Techniques
  • Grouting, Rheology, and Soil Mechanics
  • Advanced Neural Network Applications
  • Innovation Diffusion and Forecasting
  • Smart Agriculture and AI
  • Generative Adversarial Networks and Image Synthesis
  • Digital Platforms and Economics
  • Machine Learning and ELM
  • Fault Detection and Control Systems
  • Industrial Vision Systems and Defect Detection
  • Food Supply Chain Traceability
  • Machine Learning in Materials Science
  • Service and Product Innovation
  • Image Retrieval and Classification Techniques
  • Spectroscopy and Chemometric Analyses

Aalto University
2020-2024

Exscien (United States)
2024

Henan University of Technology
2023-2024

Shenyang University of Chemical Technology
2023

Shanghai University
2022

ETH Zurich
2022

10.1109/cvpr52733.2024.01956 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Visual localization is critical to many applications in computer vision and robotics. To address single-image RGB localization, state-of-the-art feature-based methods match local descriptors between a query image pre-built 3D model. Recently, deep neural networks have been exploited regress the mapping raw pixels coordinates scene, thus matching implicitly performed by forward pass through network. However, large ambiguous environment, learning such regression task directly can be difficult...

10.1109/cvpr42600.2020.01200 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Abstract Visual localization is critical to many applications in computer vision and robotics. To address single-image RGB localization, state-of-the-art feature-based methods match local descriptors between a query image pre-built 3D model. Recently, deep neural networks have been exploited regress the mapping raw pixels coordinates scene, thus matching implicitly performed by forward pass through network. However, large ambiguous environment, learning such regression task directly can be...

10.1007/s11263-023-01982-9 article EN cc-by International Journal of Computer Vision 2024-02-06

Visual (re)localization addresses the problem of estimating 6-DoF (Degree Freedom) camera pose a query image captured in known scene, which is key building block many computer vision and robotics applications. Recent advances structure-based localization solve this by memorizing mapping from pixels to scene coordinates with neural networks build 2D-3D correspondences for optimization. However, such memorization requires training amounts posed images each heavy inefficient. On contrary,...

10.1109/3dv57658.2022.00051 article EN 2021 International Conference on 3D Vision (3DV) 2022-09-01

For several emerging technologies such as augmented reality, autonomous driving and robotics, visual localization is a critical component. Directly regressing camera pose/3D scene coordinates from the input image using deep neural networks has shown great potential. However, methods assume stationary data distribution with all scenes simultaneously available during training. In this paper, we approach problem of in continual learning setup – whereby model trained on an incremental manner....

10.1109/iccv48922.2021.00324 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Nowadays the CNN is widely used in practical applications for image classification task. However design of model very professional work and which difficult ordinary users. Besides, even experts CNN, to select an optimal specific task may still need a lot time (to train many different models). In order solve this problem, we proposed automated recommendation system Our able evaluate complexity ability precisely. By using evaluation results, can recommend match perfectly. The process fast...

10.1109/icme.2017.8019347 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2017-07-01

Matching 2D keypoints in an image to a sparse 3D point cloud of the scene without requiring visual descriptors has garnered increased interest due its low memory requirements, inherent privacy preservation, and reduced need for expensive model maintenance compared descriptor-based methods. However, existing algorithms often compromise on performance, resulting significant deterioration their counterparts. In this paper, we introduce DGC-GNN, novel algorithm that employs global-to-local Graph...

10.48550/arxiv.2306.12547 preprint EN cc-by arXiv (Cornell University) 2023-01-01

10.1109/cvpr52733.2024.01973 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Fully-supervised CNN-based approaches for learning local image descriptors have shown remarkable results in a wide range of geometric tasks. However, most them require per-pixel ground-truth keypoint correspondence data which is difficult to acquire at scale. To address this challenge, recent weakly-and self-supervised methods can learn feature from relative camera poses or using only synthetic rigid transformations such as homographies. In work, we focus on understanding the limitations...

10.1109/3dv53792.2021.00122 article EN 2021 International Conference on 3D Vision (3DV) 2021-12-01

Multi-view stereo reconstruction (MVS) in the wild requires to first estimate camera parameters e.g. intrinsic and extrinsic parameters. These are usually tedious cumbersome obtain, yet they mandatory triangulate corresponding pixels 3D space, which is core of all best performing MVS algorithms. In this work, we take an opposite stance introduce DUSt3R, a radically novel paradigm for Dense Unconstrained Stereo Reconstruction arbitrary image collections, i.e. operating without prior...

10.48550/arxiv.2312.14132 preprint EN other-oa arXiv (Cornell University) 2023-01-01

We propose a new method, called curvature similarity extractor (CSE), for improving local feature matching across images. CSE calculates the of 3D surface patch each detected point in viewpoint-invariant manner via fitting quadrics to predicted monocular depth maps. This is then leveraged as an additional signal with off-the-shelf matchers like SuperGlue and LoFTR. Additionally, enables end-to-end joint training by connecting matcher predictor networks. Our experiments demonstrate on...

10.1109/iccv51070.2023.01648 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

We present a comprehensive study investigating the potential gain in accuracy for calculating absolute solvation free energies (ASFE) using neural network to describe intramolecular energy of solute. calculated ASFE most compounds from FreeSolv database Open Force Field (OpenFF) and compared them earlier results obtained with CHARMM General (CGenFF). By applying nonequilibrium (NEQ) switching approach between molecular mechanics (MM) description (either OpenFF or CGenFF) net (NNP)/MM level...

10.26434/chemrxiv-2023-8jgjq-v2 preprint EN cc-by-nc-nd 2024-03-01

Camera relocalization relies on 3D models of the scene with a large memory footprint that is incompatible budget several applications. One solution to reduce size map compression by removing certain points and descriptor quantization. This achieves high but leads performance drop due information loss. To address trade-off, we train light-weight scene-specific auto-encoder network performs quantization-dequantization in an end-to-end differentiable manner updating both product quantization...

10.48550/arxiv.2407.15540 preprint EN arXiv (Cornell University) 2024-07-22

Recent advancements in 3D Gaussian Splatting (3D-GS) have revolutionized novel view synthesis, facilitating real-time, high-quality image rendering. However, scenarios involving reflective surfaces, particularly mirrors, 3D-GS often misinterprets reflections as virtual spaces, resulting blurred and inconsistent multi-view rendering within mirrors. Our paper presents a method aimed at obtaining consistent reflection by modelling physically-based cameras. We estimate mirror planes with depth...

10.48550/arxiv.2410.01614 preprint EN arXiv (Cornell University) 2024-10-02

Gaussian splatting enables fast novel view synthesis in static 3D environments. However, reconstructing real-world environments remains challenging as distractors or occluders break the multi-view consistency assumption required for accurate reconstruction. Most existing methods rely on external semantic information from pre-trained models, introducing additional computational overhead pre-processing steps during optimization. In this work, we propose a method, DeSplat, that directly...

10.48550/arxiv.2411.19756 preprint EN arXiv (Cornell University) 2024-11-29

Visual localization aims to determine the camera pose of a query image relative database posed images. In recent years, deep neural networks that directly regress poses have gained popularity due their fast inference capabilities. However, existing methods struggle either generalize well new scenes or provide accurate estimates. To address these issues, we present \textbf{Reloc3r}, simple yet effective visual framework. It consists an elegantly designed regression network, and minimalist...

10.48550/arxiv.2412.08376 preprint EN arXiv (Cornell University) 2024-12-11

In this paper, we introduce \textbf{SLAM3R}, a novel and effective monocular RGB SLAM system for real-time high-quality dense 3D reconstruction. SLAM3R provides an end-to-end solution by seamlessly integrating local reconstruction global coordinate registration through feed-forward neural networks. Given input video, the first converts it into overlapping clips using sliding window mechanism. Unlike traditional pose optimization-based methods, directly regresses pointmaps from images in each...

10.48550/arxiv.2412.09401 preprint EN arXiv (Cornell University) 2024-12-12

Visual localization is critical to many applications in computer vision and robotics. To address single-image RGB localization, state-of-the-art feature-based methods match local descriptors between a query image pre-built 3D model. Recently, deep neural networks have been exploited regress the mapping raw pixels coordinates scene, thus matching implicitly performed by forward pass through network. However, large ambiguous environment, learning such regression task directly can be difficult...

10.48550/arxiv.2305.03595 preprint EN cc-by arXiv (Cornell University) 2023-01-01

It is difficult to build accurate prediction models for complex chemical processes with significant nonlinearity and dynamics based on traditional shallow static models. A model improved Transformer called CNN-Trans proposed address the above challenges. In order improve efficiency of enhance its local feature extraction capability, CNN architecture applied Transformer's architecture: (1) Dilated causal convolution used in embedding layer obtain multi-scale information a larger sensory...

10.1109/iccea58433.2023.10135471 article EN 2023-04-07

In this paper, the effect of metakaolin (MK) on durability NHL2-SAC-WER-based grouting materials in 5% sodium sulfate solution and sulfuric acid (PH=1) was investigated. A comprehensive study carried out for attacked samples by measurement compressive strength, X-ray diffraction (XRD), scanning electron microscopy (SEM) etc. The results showed that addition MK could effectively enhance resistance acid. For immersed solution, larger ettringite crystals were formed inside cross-supported with...

10.2139/ssrn.4116221 article EN SSRN Electronic Journal 2022-01-01

Visual (re)localization addresses the problem of estimating 6-DoF (Degree Freedom) camera pose a query image captured in known scene, which is key building block many computer vision and robotics applications. Recent advances structure-based localization solve this by memorizing mapping from pixels to scene coordinates with neural networks build 2D-3D correspondences for optimization. However, such memorization requires training amounts posed images each heavy inefficient. On contrary,...

10.48550/arxiv.2208.06933 preprint EN other-oa arXiv (Cornell University) 2022-01-01
Coming Soon ...