- Advanced Image Processing Techniques
- Advanced Vision and Imaging
- Image and Signal Denoising Methods
- Image Enhancement Techniques
- Generative Adversarial Networks and Image Synthesis
- Advanced Data Compression Techniques
- Video Coding and Compression Technologies
- Image Processing Techniques and Applications
- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Adversarial Robustness in Machine Learning
- Optical measurement and interference techniques
- Video Analysis and Summarization
- Computer Graphics and Visualization Techniques
- Color Science and Applications
- Robotics and Sensor-Based Localization
- Face recognition and analysis
- Digital Holography and Microscopy
- Advanced Image and Video Retrieval Techniques
- Image and Video Quality Assessment
- 3D Shape Modeling and Analysis
- Face and Expression Recognition
- 3D Surveying and Cultural Heritage
- Advanced Numerical Methods in Computational Mathematics
- Digital Media Forensic Detection
Walt Disney (United States)
2014-2025
Walt Disney (Switzerland)
2018-2024
Board of the Swiss Federal Institutes of Technology
2018
ETH Zurich
2018
Saarland University
2012-2016
This paper reviews the 2nd NTIRE challenge on single image super-resolution (restoration of rich details in a low resolution image) with focus proposed solutions and results. The had 4 tracks. Track 1 employed standard bicubic downscaling setup, while Tracks 2, 3 realistic unknown downgrading operators simulating camera acquisition pipeline. were learnable through provided pairs high train images. tracks 145, 114, 101, 113 registered participants, resp., 31 teams competed final testing...
Most recent semantic segmentation methods train deep convolutional neural networks with fully annotated masks requiring pixel-accuracy for good quality training. Common weakly-supervised approaches generate full from partial input (e.g. scribbles or seeds) using standard interactive as preprocessing. But, errors in such result poorer training since loss functions cross-entropy) do not distinguish seeds potentially mislabeled other pixels. Inspired by the general ideas semi-supervised...
Recent deep learning approaches to single image superresolution have achieved impressive results in terms of traditional error measures and perceptual quality. However, each case it remains challenging achieve high quality for large upsampling factors. To this end, we propose a method (ProSR) that is progressive both architecture training: the network upsamples an intermediate steps, while process organized from easy hard, as done curriculum learning. obtain more photorealistic results,...
Most approaches for video frame interpolation require accurate dense correspondences to synthesize an in-between frame. Therefore, they do not perform well in challenging scenarios with e.g. lighting changes or motion blur. Recent deep learning that rely on kernels represent can only alleviate these problems some extent. In those cases, methods use a per-pixel phase-based representation have been shown work well. However, are applicable limited amount of motion. We propose new approach,...
While there are many deep learning based approaches for single image compression, the field of end-to-end learned video coding has remained much less explored. Therefore, in this work we present an inter-frame compression approach neural that can seamlessly build up on different existing codecs. Our solution performs temporal prediction by optical flow motion compensation pixel space. The key insight is increase both decoding efficiency and reconstruction quality encoding required...
Existing deep learning approaches to single image super-resolution have achieved impressive results but mostly assume a setting with fixed pairs of high resolution and low images. However, robustly address realistic upscaling scenarios where the relation between images is unknown, blind required. To this end, we propose solution that relies on three components: First, use degradation aware SR network synthesize HR given corresponding blur kernel. Second, train kernel discriminator analyze...
Abstract In this paper, we propose an algorithm for fully automatic neural face swapping in images and videos. To the best of our knowledge, is first method capable rendering photo‐realistic temporally coherent results at megapixel resolution. end, introduce a progressively trained multi‐way comb network light‐ contrast‐preserving blending method. We also show that while progressive training enables generation high‐resolution images, extending architecture data beyond two people allows us to...
Point sets generated by image-based 3D reconstruction techniques are often much noisier than those obtained using active like laser scanning. Therefore, they pose greater challenges to the subsequent surface (meshing) stage. We present a simple and effective method for removing noise outliers from such point sets. Our algorithm uses input images corresponding depth maps remove pixels which geometrically or photometrically inconsistent with colored implied input. This allows standard methods...
Aligning video is a fundamental task in computer graphics and vision, required for wide range of applications. We present an interactive method computing optimal nonlinear temporal alignments arbitrary number videos. first derive robust approximation alignment quality between pairs clips, computed as weighted histogram feature matches. then find mappings (constituting frame correspondences) using graph-based approach that allows very efficient evaluation with artist constraints. This enables...
Most recent semantic segmentation methods train deep convolutional neural networks with fully annotated masks requiring pixel-accuracy for good quality training. Common weakly-supervised approaches generate full from partial input (e.g. scribbles or seeds) using standard interactive as preprocessing. But, errors in such result poorer training since loss functions cross-entropy) do not distinguish seeds potentially mislabeled other pixels. Inspired by the general ideas semi-supervised...
Face recognition models embed a face image into low-dimensional identity vector containing abstract encodings of identity-specific facial features that allow individuals to be distinguished from one another. We tackle the challenging task inverting latent space pre-trained without full model access (i.e. black-box setting). A variety methods have been proposed in literature for this task, but they serious shortcomings such as lack realistic outputs and strong requirements data set...
Despite recent advances in Novel View Synthesis (NVS), generating high-fidelity views from single or sparse observations remains a significant challenge. Existing splatting-based approaches often produce distorted geometry due to splatting errors. While diffusion-based methods leverage rich 3D priors achieve improved geometry, they suffer texture hallucination. In this paper, we introduce SplatDiff, pixel-splatting-guided video diffusion model designed synthesize novel image. Specifically,...
Generative neural image compression supports data representation at extremely low bitrate, synthesizing details the client and consistently producing highly realistic images. By leveraging similarities between quantization error additive noise, diffusion-based generative codecs can be built using a latent diffusion model to "denoise" artifacts introduced by quantization. However, we identify three critical gaps in previous approaches following this paradigm (namely, noise level, type,...
In this article, we describe a complete pipeline for the capture and display of real-world Virtual Reality video content, based on concept omnistereoscopic panoramas. We address important practical theoretical issues that have remained undiscussed in previous works. On side, show how high-quality omnistereo can be generated from sparse set cameras (16 our prototype array) instead hundreds input views previously required. Despite number views, approach allows high quality, real-time virtual...
Encoding videos as neural networks is a recently proposed approach that allows new forms of video processing. However, traditional techniques still outperform such representation (NVR) methods for the task compression. This performance gap can be explained by fact current NVR methods: i) use architectures do not efficiently obtain compact temporal and spatial information; ii) minimize rate distortion disjointly (first overfitting network on then using heuristic post-training quantization or...
Deep learning based image compression has recently witnessed exciting progress and in some cases even managed to surpass transform coding approaches that have been established refined over many decades. However, state-of-the-art solutions for deep typically employ autoencoders which map the input a lower dimensional latent space thus irreversibly discard information already before quantization. Due that, they inherently limit range of quality levels can be covered. In contrast, traditional...
Although existing neural video compression~(NVC) methods have achieved significant success, most of them focus on improving either temporal or spatial information separately. They generally use simple operations such as concatenation subtraction to utilize this information, while only partially exploit spatio-temporal redundancies. This work aims effectively and jointly leverage robust by proposing a new 3D-based transformer module: Spatio-Temporal Cross-Covariance Transformer (ST-XCT). The...
The usage of deep generative models for image compression has led to impressive performance gains over classical codecs while neural video is still in its infancy. Here, we propose an end-to-end, modeling approach compress temporal sequences with a focus on video. Our builds upon variational autoencoder (VAE) sequential data and combines them recent work compression. jointly learns transform the original sequence into lower-dimensional representation as well discretize entropy code this...
Recent deep learning approaches to single image super-resolution have achieved impressive results in terms of traditional error measures and perceptual quality. However, each case it remains challenging achieve high quality for large upsampling factors. To this end, we propose a method (ProSR) that is progressive both architecture training: the network upsamples an intermediate steps, while process organized from easy hard, as done curriculum learning. obtain more photorealistic results,...
Video frame interpolation has seen important progress in recent years, thanks to developments several directions. Some works leverage better optical flow methods with improved splatting strategies or additional cues from depth, while others have investigated alternative approaches through direct predictions transformers. Still, the problem remains unsolved more challenging conditions such as complex lighting large motion. In this work, we are bridging gap towards video production a novel...