Johan Edstedt

ORCID: 0000-0002-1019-8634
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Robotics and Sensor-Based Localization
  • Human Pose and Action Recognition
  • Advanced Image and Video Retrieval Techniques
  • Video Analysis and Summarization
  • Advanced Vision and Imaging
  • Anomaly Detection Techniques and Applications
  • Gaussian Processes and Bayesian Inference
  • Digital Media Forensic Detection
  • Domain Adaptation and Few-Shot Learning
  • Machine Learning and Data Classification
  • Video Surveillance and Tracking Methods
  • Impact of Light on Environment and Health
  • Inertial Sensor and Navigation
  • Image Processing Techniques and Applications
  • 3D Shape Modeling and Analysis
  • Image and Object Detection Techniques
  • Image Retrieval and Classification Techniques
  • 3D Surveying and Cultural Heritage
  • Smart Parking Systems Research
  • Opportunistic and Delay-Tolerant Networks
  • Augmented Reality Applications
  • Multimodal Machine Learning Applications
  • Human Mobility and Location-Based Analysis
  • Blind Source Separation Techniques
  • Cognitive Computing and Networks

Linköping University
2022-2024

Feature matching is a challenging computer vision task that involves finding correspondences between two images of 3D scene. In this paper we consider the dense approach instead more common sparse paradigm, thus striving to find all correspondences. Perhaps counter-intuitively, methods have previously shown inferior performance their and semi-sparse counterparts for estimation two-view geometry. This changes with our novel method, which outperforms both on geometry estimation. The novelty...

10.1109/cvpr52729.2023.01704 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

10.1109/cvpr52733.2024.01871 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Keypoint detection is a pivotal step in 3D reconstruction, whereby sets of (up to) K points are detected each view scene. Crucially, the need to be consistent between views, i.e., correspond same point One main challenges with keypoint formulation learning objective. Previous learning-based methods typically jointly learn descriptors keypoints, and treat as binary classification task on mutual nearest neighbours. However, basing descriptor neighbours proxy task, which not guaranteed produce...

10.1109/3dv62453.2024.00035 article EN 2021 International Conference on 3D Vision (3DV) 2024-03-18

Due to long-distance correlation and powerful pretrained models, transformer-based methods have initiated a breakthrough in visual object tracking performance. Previous works focus on designing effective architectures suited for tracking, but ignore that data augmentation is equally crucial training well-performing model. In this paper, we first explore the impact of general augmentations trackers via systematic experiments, reveal limited effectiveness these common strategies. Motivated by...

10.1109/wacv57701.2024.00634 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024-01-03

10.1109/cvpr52733.2024.00467 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Feature matching is an important computer vision task that involves estimating correspondences between two images of a 3D scene, and dense methods estimate all such correspondences. The aim to learn robust model, i.e., model able match under challenging real-world changes. In this work, we propose leveraging frozen pretrained features from the foundation DINOv2. Although these are significantly more than local trained scratch, they inherently coarse. We therefore combine them with...

10.48550/arxiv.2305.15404 preprint EN cc-by arXiv (Cornell University) 2023-01-01

We tackle the task of scene flow estimation from point clouds. Given a source and target cloud, objective is to estimate translation each in cloud target, resulting 3D motion vector field. Previous dominant methods require complicated coarse-to-fine or recurrent architectures as multi-stage refinement. In contrast, we propose significantly simpler single-scale one-shot global matching address problem. Our key finding that reliable feature similarity between pairs essential sufficient...

10.48550/arxiv.2305.17432 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

In this paper, we analyze and improve into the recently proposed DeDoDe keypoint detector. We focus our analysis on some key issues. First, find that keypoints tend to cluster together, which fix by performing non-max suppression target distribution of detector during training. Second, address issues related data augmentation. particular, is sensitive large rotations. including 90-degree rotations as well horizontal flips. Finally, decoupled nature makes evaluation downstream usefulness...

10.48550/arxiv.2404.08928 preprint EN arXiv (Cornell University) 2024-04-13

10.1109/cvprw63382.2024.00428 article 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2024-06-17

Research in 3D mapping is crucial for smart city applications, yet the cost of acquiring data often hinders progress. Visual localization, particularly monocular camera position estimation, offers a solution by determining camera's pose solely through visual cues. However, this task challenging due to limited from single camera. To tackle these challenges, we organized AISG–SLA Localization Challenge (VLC) at IJCAI 2023 explore how AI can accurately extract 2D images space. The challenge...

10.24963/ijcai.2024/1003 article EN 2024-07-26

Research in 3D mapping is crucial for smart city applications, yet the cost of acquiring data often hinders progress. Visual localization, particularly monocular camera position estimation, offers a solution by determining camera's pose solely through visual cues. However, this task challenging due to limited from single camera. To tackle these challenges, we organized AISG-SLA Localization Challenge (VLC) at IJCAI 2023 explore how AI can accurately extract 2D images space. The challenge...

10.24963/ijcai.2024/1003 preprint EN arXiv (Cornell University) 2024-07-26

We propose a way to train deep learning based keypoint descriptors that makes them approximately equivariant for locally affine transformations of the image plane. The main idea is use representation theory GL(2) generalize recently introduced concept steerers from rotations transformations. Affine give high control over how descriptions transform under demonstrate potential using this matching. Finally, we finetune with set on upright images and obtain state-of-the-art results several...

10.48550/arxiv.2408.14186 preprint EN arXiv (Cornell University) 2024-08-26

Keypoint detection is a pivotal step in 3D reconstruction, whereby sets of (up to) K points are detected each view scene. Crucially, the need to be consistent between views, i.e., correspond same point One main challenges with keypoint formulation learning objective. Previous learning-based methods typically jointly learn descriptors keypoints, and treat as binary classification task on mutual nearest neighbours. However, basing descriptor neighbours proxy task, which not guaranteed produce...

10.48550/arxiv.2308.08479 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Few-shot segmentation is a challenging task, requiring the extraction of generalizable representation from only few annotated samples, in order to segment novel query images. A common approach model each class with single prototype. While conceptually simple, these methods suffer when target appearance distribution multi-modal or not linearly separable feature space. To tackle this issue, we propose few-shot learner formulation based on Gaussian process (GP) regression. Through expressivity...

10.48550/arxiv.2103.16549 preprint EN cc-by-nc-sa arXiv (Cornell University) 2021-01-01

A challenge in image based metrology and forensics is intrinsic camera calibration when the used unavailable. The unavailability raises two questions. first question how to find projection model that describes camera, second detect incorrect models. In this work, we use off-the-shelf extended PnP-methods from 2D-3D correspondences, propose a method for validation. most common strategy evaluating comparing different models' residual variances - however, naive cannot distinguish whether...

10.48550/arxiv.2302.06949 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Due to long-distance correlation and powerful pretrained models, transformer-based methods have initiated a breakthrough in visual object tracking performance. Previous works focus on designing effective architectures suited for tracking, but ignore that data augmentation is equally crucial training well-performing model. In this paper, we first explore the impact of general augmentations trackers via systematic experiments, reveal limited effectiveness these common strategies. Motivated by...

10.48550/arxiv.2309.08264 preprint EN other-oa arXiv (Cornell University) 2023-01-01

We present the top ranked solution for AISG-SLA Visual Localisation Challenge benchmark (IJCAI 2023), where task is to estimate relative motion between images taken in sequence by a camera mounted on car driving through an urban scene. For matching we use our recent deep learning based matcher RoMa. Matching image pairs sequentially and estimating from point correspondences sampled RoMa already gives very competitive results -- third rank challenge benchmark. To improve estimations extract...

10.48550/arxiv.2310.01092 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Image keypoint descriptions that are discriminative and matchable over large changes in viewpoint vital for 3D reconstruction. However, output by learned descriptors typically not robust to camera rotation. While they can be made more by, e.g., data augmentation, this degrades performance on upright images. Another approach is test-time which incurs a significant increase runtime. Instead, we learn linear transform description space encodes rotations of the input image. We call steerer since...

10.48550/arxiv.2312.02152 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Feature matching is a challenging computer vision task that involves finding correspondences between two images of 3D scene. In this paper we consider the dense approach instead more common sparse paradigm, thus striving to find all correspondences. Perhaps counter-intuitively, methods have previously shown inferior performance their and semi-sparse counterparts for estimation two-view geometry. This changes with our novel method, which outperforms both on geometry estimation. The novelty...

10.48550/arxiv.2202.00667 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01

Automatically identifying harmful content in video is an important task with a wide range of applications. However, there lack professionally labeled open datasets available. In this work VidHarm, dataset 3589 clips from film trailers annotated by professionals, presented. An analysis the performed, revealing among other things relation between clip and trailer level annotations. Audiovisual models are trained on in-depth study modeling choices conducted. The results show that performance...

10.1109/icpr56361.2022.9956148 article EN 2022 26th International Conference on Pattern Recognition (ICPR) 2022-08-21
Coming Soon ...