- Advanced Vision and Imaging
- Advanced Image and Video Retrieval Techniques
- Robotics and Sensor-Based Localization
- Advanced Image Processing Techniques
- Advanced Neural Network Applications
- Image Processing Techniques and Applications
- Image Retrieval and Classification Techniques
- Video Analysis and Summarization
- Visual Attention and Saliency Detection
- Human Pose and Action Recognition
- Music and Audio Processing
- Indoor and Outdoor Localization Technologies
- Face recognition and analysis
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Generative Adversarial Networks and Image Synthesis
- Speech and Audio Processing
- Anomaly Detection Techniques and Applications
- 3D Surveying and Cultural Heritage
- Optical measurement and interference techniques
- Image Enhancement Techniques
- Video Surveillance and Tracking Methods
- Computer Graphics and Visualization Techniques
- Image and Signal Denoising Methods
- Osteoarthritis Treatment and Mechanisms
Tampere University
2017-2025
Nokia (Finland)
2023
Tampere University of Applied Sciences
2017-2020
Czech Technical University in Prague
2020
Tampere University
2017-2019
ETH Zurich
2019
Signal Processing (United States)
2018-2019
University of Oulu
2008-2017
Lund University
2011
Statistics Finland
2010
This paper introduces FGVC-Aircraft, a new dataset containing 10,000 images of aircraft spanning 100 models, organised in three-level hierarchy. At the finer level, differences between models are often subtle but always visually measurable, making visual recognition challenging possible. A benchmark is obtained by defining corresponding classification tasks and evaluation protocols, baseline results presented. The construction this was made possible work enthusiasts, strategy that can extend...
Knee osteoarthritis (OA) is the most common musculoskeletal disorder. OA diagnosis currently conducted by assessing symptoms and evaluating plain radiographs, but this process suffers from subjectivity. In study, we present a new transparent computer-aided method based on Deep Siamese Convolutional Neural Network to automatically score knee severity according Kellgren-Lawrence grading scale. We trained our using data solely Multicenter Osteoarthritis Study validated it randomly selected...
Finding matching images across large datasets plays a key role in many computer vision applications such as structure-from-motion (SfM), multi-view 3D reconstruction, image retrieval, and image-based localisation. In this paper, we propose finding non-matching pairs of by representing them with neural network based feature vectors, whose similarity is measured Euclidean distance. The vectors are obtained convolutional networks which learnt from labeled examples using contrastive loss...
In this paper, recognition of blurred faces using the recently introduced Local Phase Quantization (LPQ) operator is proposed. LPQ based on quantizing Fourier transform phase in local neighborhoods. The can be shown to a blur invariant property under certain commonly fulfilled conditions. face image analysis, histograms labels computed within regions are used as descriptor similarly widely Binary Pattern (LBP) methodology for description. experimental results CMU PIE and FRGC 1.0.4 datasets...
Abstract Knee osteoarthritis (OA) is the most common musculoskeletal disease without a cure, and current treatment options are limited to symptomatic relief. Prediction of OA progression very challenging timely issue, it could, if resolved, accelerate modifying drug development ultimately help prevent millions total joint replacement surgeries performed annually. Here, we present multi-modal machine learning-based prediction model that utilises raw radiographic data, clinical examination...
In this paper, we propose an encoder-decoder convolutional neural network (CNN) architecture for estimating camera pose (orientation and location) from a single RGB-image. The has hourglass shape consisting of chain convolution up-convolution layers followed by regression part. are introduced to preserve the fine-grained information input image. Following common practice, train our model in end-to-end manner utilizing transfer learning large scale classification data. experiments demonstrate...
Dense video captioning is a task of localizing interesting events from an untrimmed and producing textual description (captions) for each localized event. Most the previous works in dense are solely based on visual information completely ignore audio track. However, audio, speech, particular, vital cues human observer understanding environment. In this paper, we present new approach that able to utilize any number modalities event description. Specifically, show how speech may improve model....
Cascades are a popular framework to speed up object detection systems. Here we focus on the first layers of category independent cascade in which sample large number windows from an objectness prior, and then discriminatively learn filter these candidate by order magnitude. We make contributions design that substantially improve over state art: (i) our novel prior gives much higher recall than competing methods, (ii) propose features give high performance with very low computational cost,...
We present a method for generating object segmentation proposals from groups of superpixels. The goal is to propose accurate segmentations all objects an image. proposed hypotheses can be used as input detection systems and thereby improve efficiency by replacing exhaustive search. are generated in class-independent manner therefore the computational cost approach independent number classes. Our combines both global local search space sets implemented greedily merging adjacent pairs...
The aim of the study was to assess whether texture analysis is feasible for automated identification epithelium and stroma in digitized tumor tissue microarrays (TMAs). Texture based on local binary patterns (LBP) has previously been used successfully applications such as face recognition industrial machine vision. TMAs with samples from 643 patients colorectal cancer were using a whole slide scanner areas representing annotated images. Well-defined images (n = 41) 39) training support...
Video summarization is a technique to create short skim of the original video while preserving main stories/content. There exists substantial interest in automatizing this process due rapid growth available material. The recent progress has been facilitated by public benchmark datasets, which enable easy and fair comparison methods. Currently established evaluation protocol compare generated summary with respect set reference summaries provided dataset. In paper, we will provide in-depth...
Introduction Microscopy is the gold standard for diagnosis of malaria, however, manual evaluation blood films highly dependent on skilled personnel in a time-consuming, error-prone and repetitive process. In this study we propose method using computer vision detection visualization only diagnostically most relevant sample regions digitized smears. Methods Giemsa-stained thin with P. falciparum ring-stage trophozoites (n = 27) uninfected controls 20) were digitally scanned an oil immersion...
This paper addresses the challenge of dense pixel correspondence estimation between two images. problem is closely related to optical flow task where ConvNets (CNNs) have recently achieved significant progress. While methods produce very accurate results for small translation and limited appearance variation scenarios, they hardly deal with strong geometric transformations that we consider in this work. In paper, propose a coarse-to-fine CNN-based framework can leverage advantages approaches...
Automatically generating a summary of sports video poses the challenge detecting interesting moments, or highlights, game. Traditional summarization methods leverage editing conventions broadcast that facilitate extraction high-level semantics. However, user-generated videos are not edited and, thus, traditional suitable to generate summary. In order solve this problem, paper proposes novel method uses players' actions as cue determine highlights original video. A deep neural-network-based...
Over recent years, deep learning-based computer vision systems have been applied to images at an ever-increasing pace, oftentimes representing the only type of consumption for those images. Given dramatic explosion in number generated per day, a question arises: how much better would image codec targeting machine-consumption perform against state-of-the-art codecs human-consumption? In this paper, we propose machines which is neural network (NN) based and end-to-end learned. particular, set...
This paper presents a generic face animator that is able to control the pose and expressions of given image. The animation driven by human interpretable signals consisting head angles Action Unit (AU) values. information can be obtained from multiple sources including external driving videos manual controls. Due nature signal, one easily mix between (e.g. image expression another) apply selective postproduction editing. proposed implemented as two stage neural network model learned in...