Esa Rahtu

ORCID: 0000-0001-8767-0864
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Vision and Imaging
  • Advanced Image and Video Retrieval Techniques
  • Robotics and Sensor-Based Localization
  • Advanced Image Processing Techniques
  • Advanced Neural Network Applications
  • Image Processing Techniques and Applications
  • Image Retrieval and Classification Techniques
  • Video Analysis and Summarization
  • Visual Attention and Saliency Detection
  • Human Pose and Action Recognition
  • Music and Audio Processing
  • Indoor and Outdoor Localization Technologies
  • Face recognition and analysis
  • Multimodal Machine Learning Applications
  • Domain Adaptation and Few-Shot Learning
  • Generative Adversarial Networks and Image Synthesis
  • Speech and Audio Processing
  • Anomaly Detection Techniques and Applications
  • 3D Surveying and Cultural Heritage
  • Optical measurement and interference techniques
  • Image Enhancement Techniques
  • Video Surveillance and Tracking Methods
  • Computer Graphics and Visualization Techniques
  • Image and Signal Denoising Methods
  • Osteoarthritis Treatment and Mechanisms

Tampere University
2017-2025

Nokia (Finland)
2023

Tampere University of Applied Sciences
2017-2020

Czech Technical University in Prague
2020

Tampere University
2017-2019

ETH Zurich
2019

Signal Processing (United States)
2018-2019

University of Oulu
2008-2017

Lund University
2011

Statistics Finland
2010

This paper introduces FGVC-Aircraft, a new dataset containing 10,000 images of aircraft spanning 100 models, organised in three-level hierarchy. At the finer level, differences between models are often subtle but always visually measurable, making visual recognition challenging possible. A benchmark is obtained by defining corresponding classification tasks and evaluation protocols, baseline results presented. The construction this was made possible work enthusiasts, strategy that can extend...

10.48550/arxiv.1306.5151 preprint EN other-oa arXiv (Cornell University) 2013-01-01

Knee osteoarthritis (OA) is the most common musculoskeletal disorder. OA diagnosis currently conducted by assessing symptoms and evaluating plain radiographs, but this process suffers from subjectivity. In study, we present a new transparent computer-aided method based on Deep Siamese Convolutional Neural Network to automatically score knee severity according Kellgren-Lawrence grading scale. We trained our using data solely Multicenter Osteoarthritis Study validated it randomly selected...

10.1038/s41598-018-20132-7 article EN cc-by Scientific Reports 2018-01-23

Finding matching images across large datasets plays a key role in many computer vision applications such as structure-from-motion (SfM), multi-view 3D reconstruction, image retrieval, and image-based localisation. In this paper, we propose finding non-matching pairs of by representing them with neural network based feature vectors, whose similarity is measured Euclidean distance. The vectors are obtained convolutional networks which learnt from labeled examples using contrastive loss...

10.1109/icpr.2016.7899663 article EN 2016-12-01

In this paper, recognition of blurred faces using the recently introduced Local Phase Quantization (LPQ) operator is proposed. LPQ based on quantizing Fourier transform phase in local neighborhoods. The can be shown to a blur invariant property under certain commonly fulfilled conditions. face image analysis, histograms labels computed within regions are used as descriptor similarly widely Binary Pattern (LBP) methodology for description. experimental results CMU PIE and FRGC 1.0.4 datasets...

10.1109/icpr.2008.4761847 article EN Proceedings - International Conference on Pattern Recognition/Proceedings/International Conference on Pattern Recognition 2008-12-01

Abstract Knee osteoarthritis (OA) is the most common musculoskeletal disease without a cure, and current treatment options are limited to symptomatic relief. Prediction of OA progression very challenging timely issue, it could, if resolved, accelerate modifying drug development ultimately help prevent millions total joint replacement surgeries performed annually. Here, we present multi-modal machine learning-based prediction model that utilises raw radiographic data, clinical examination...

10.1038/s41598-019-56527-3 article EN cc-by Scientific Reports 2019-12-27

In this paper, we propose an encoder-decoder convolutional neural network (CNN) architecture for estimating camera pose (orientation and location) from a single RGB-image. The has hourglass shape consisting of chain convolution up-convolution layers followed by regression part. are introduced to preserve the fine-grained information input image. Following common practice, train our model in end-to-end manner utilizing transfer learning large scale classification data. experiments demonstrate...

10.1109/iccvw.2017.107 article EN 2017-10-01

Dense video captioning is a task of localizing interesting events from an untrimmed and producing textual description (captions) for each localized event. Most the previous works in dense are solely based on visual information completely ignore audio track. However, audio, speech, particular, vital cues human observer understanding environment. In this paper, we present new approach that able to utilize any number modalities event description. Specifically, show how speech may improve model....

10.1109/cvprw50498.2020.00487 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2020-06-01

10.1109/wacv61041.2025.00241 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025-02-26

Cascades are a popular framework to speed up object detection systems. Here we focus on the first layers of category independent cascade in which sample large number windows from an objectness prior, and then discriminatively learn filter these candidate by order magnitude. We make contributions design that substantially improve over state art: (i) our novel prior gives much higher recall than competing methods, (ii) propose features give high performance with very low computational cost,...

10.1109/iccv.2011.6126351 article EN International Conference on Computer Vision 2011-11-01

We present a method for generating object segmentation proposals from groups of superpixels. The goal is to propose accurate segmentations all objects an image. proposed hypotheses can be used as input detection systems and thereby improve efficiency by replacing exhaustive search. are generated in class-independent manner therefore the computational cost approach independent number classes. Our combines both global local search space sets implemented greedily merging adjacent pairs...

10.1109/cvpr.2014.310 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2014-06-01

The aim of the study was to assess whether texture analysis is feasible for automated identification epithelium and stroma in digitized tumor tissue microarrays (TMAs). Texture based on local binary patterns (LBP) has previously been used successfully applications such as face recognition industrial machine vision. TMAs with samples from 643 patients colorectal cancer were using a whole slide scanner areas representing annotated images. Well-defined images (n = 41) 39) training support...

10.1186/1746-1596-7-22 article EN cc-by Diagnostic Pathology 2012-03-02

Video summarization is a technique to create short skim of the original video while preserving main stories/content. There exists substantial interest in automatizing this process due rapid growth available material. The recent progress has been facilitated by public benchmark datasets, which enable easy and fair comparison methods. Currently established evaluation protocol compare generated summary with respect set reference summaries provided dataset. In paper, we will provide in-depth...

10.1109/cvpr.2019.00778 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Introduction Microscopy is the gold standard for diagnosis of malaria, however, manual evaluation blood films highly dependent on skilled personnel in a time-consuming, error-prone and repetitive process. In this study we propose method using computer vision detection visualization only diagnostically most relevant sample regions digitized smears. Methods Giemsa-stained thin with P. falciparum ring-stage trophozoites (n = 27) uninfected controls 20) were digitally scanned an oil immersion...

10.1371/journal.pone.0104855 article EN cc-by PLoS ONE 2014-08-21

This paper addresses the challenge of dense pixel correspondence estimation between two images. problem is closely related to optical flow task where ConvNets (CNNs) have recently achieved significant progress. While methods produce very accurate results for small translation and limited appearance variation scenarios, they hardly deal with strong geometric transformations that we consider in this work. In paper, propose a coarse-to-fine CNN-based framework can leverage advantages approaches...

10.1109/wacv.2019.00115 article EN 2019-01-01

Automatically generating a summary of sports video poses the challenge detecting interesting moments, or highlights, game. Traditional summarization methods leverage editing conventions broadcast that facilitate extraction high-level semantics. However, user-generated videos are not edited and, thus, traditional suitable to generate summary. In order solve this problem, paper proposes novel method uses players' actions as cue determine highlights original video. A deep neural-network-based...

10.1109/tmm.2018.2794265 article EN IEEE Transactions on Multimedia 2018-01-15

Over recent years, deep learning-based computer vision systems have been applied to images at an ever-increasing pace, oftentimes representing the only type of consumption for those images. Given dramatic explosion in number generated per day, a question arises: how much better would image codec targeting machine-consumption perform against state-of-the-art codecs human-consumption? In this paper, we propose machines which is neural network (NN) based and end-to-end learned. particular, set...

10.1109/icassp39728.2021.9414465 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

This paper presents a generic face animator that is able to control the pose and expressions of given image. The animation driven by human interpretable signals consisting head angles Action Unit (AU) values. information can be obtained from multiple sources including external driving videos manual controls. Due nature signal, one easily mix between (e.g. image expression another) apply selective postproduction editing. proposed implemented as two stage neural network model learned in...

10.1109/wacv45572.2020.9093474 article EN 2020-03-01
Coming Soon ...