- Image and Signal Denoising Methods
- Advanced Image Processing Techniques
- Advanced Data Compression Techniques
- Generative Adversarial Networks and Image Synthesis
- Advanced Vision and Imaging
- Image and Video Quality Assessment
- Image Enhancement Techniques
- Video Coding and Compression Technologies
- Advanced Image Fusion Techniques
- Neural Networks and Applications
- Advanced Neural Network Applications
- Model Reduction and Neural Networks
- Visual Attention and Saliency Detection
- Wireless Communication Security Techniques
- Visual perception and processing mechanisms
- Image Retrieval and Classification Techniques
- Adversarial Robustness in Machine Learning
- Blind Source Separation Techniques
- Neural dynamics and brain function
- Privacy-Preserving Technologies in Data
- Advanced Image and Video Retrieval Techniques
- Cell Image Analysis Techniques
- stochastic dynamics and bifurcation
- Experimental Learning in Engineering
- Teaching and Learning Programming
Google (United States)
2018-2024
New York University
2015-2023
Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute
2023
Texas State University
2023
Institut Universitaire de France
2023
Alibaba Group (China)
2023
Google (Switzerland)
2022
Howard Hughes Medical Institute
2014-2017
Courant Institute of Mathematical Sciences
2014-2017
RWTH Aachen University
2006-2012
We describe an image compression method, consisting of a nonlinear analysis transformation, uniform quantizer, and synthesis transformation. The transforms are constructed in three successive stages convolutional linear filters activation functions. Unlike most neural networks, the joint nonlinearity is chosen to implement form local gain control, inspired by those used model biological neurons. Using variant stochastic gradient descent, we jointly optimize entire for rate-distortion...
We describe an end-to-end trainable model for image compression based on variational autoencoders. The incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This relates side information, concept universal virtually all modern codecs, but largely unexplored using artificial neural networks (ANNs). Unlike existing autoencoder methods, our trains complex prior jointly with underlying autoencoder. demonstrate that this leads state-of-the-art when...
Recent models for learned image compression are based on autoencoders, learning approximately invertible mappings from pixels to a quantized latent representation. These combined with an entropy model, prior the representation that can be used standard arithmetic coding algorithms yield compressed bitstream. Recently, hierarchical have been introduced as way exploit more structure in latents than simple fully factorized priors, improving performance while maintaining end-to-end optimization....
We introduce a parametric nonlinear transformation that is well-suited for Gaussianizing data from natural images. The are linearly transformed, and each component then normalized by pooled activity measure, computed exponentiating weighted sum of rectified exponentiated components constant. optimize the parameters full (linear transform, exponents, weights, constant) over database images, directly minimizing negentropy responses. optimized substantially Gaussianizes data, achieving...
Despite considerable progress on end-to-end optimized deep networks for image compression, video coding remains a challenging task. Recently proposed methods learned compression use optical flow and bilinear warping motion compensation show competitive rate-distortion performance relative to hand-engineered codecs like H.264 HEVC. However, these learning-based rely complex architectures training schemes including the of pre-trained networks, sequential sub-networks, adaptive rate control,...
We introduce a general framework for end-to-end optimization of the rate-distortion performance nonlinear transform codes assuming scalar quantization. The can be used to optimize any differentiable pair analysis and synthesis transforms in combination with perceptual metric. As an example, we consider code built from linear followed by form multi-dimensional local gain control. Distortion is measured state-of-the-art When optimized over large database images, this representation offers...
We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over past few years have become competitive with best linear codecs for images, and superseded them in terms rate-distortion performance established perceptual quality metrics such as MS-SSIM. assess empirical NTC help simple example sources, optimal vector quantizer is easier to estimate than natural data sources. To this end, we introduce novel variant entropy-constrained quantization....
In today's teaching and learning approaches for first-semester students, practical courses more often complement traditional theoretical lectures. This element allows an early insight into the real world of engineering, augments student motivation, enables students to acquire soft skills early. paper describes a new freshman introduction course which has been established within Bachelor Science curriculum Electrical Engineering Information Technology RWTH Aachen University, Germany. The is...
We present an image quality metric based on the transformations associated with early visual system: local luminance subtraction and gain control. Images are decomposed using a Laplacian pyramid, which subtracts estimate of mean at multiple scales. Each pyramid coefficient is then divided by amplitude (weighted sum absolute values neighbors), where weights optimized for prediction (undistorted) images from separate database. define distorted image, relative to its undistorted original, as...
We assess the performance of two techniques in context nonlinear transform coding with artificial neural networks, Sadam and GDN. Both have been success- fully used state-of-the-art image compression methods, but their has not individually assessed to this point. Together, stabilize training procedure transforms increase capacity approximate (unknown) rate-distortion optimal functions. Besides comparing established alternatives, we detail implementation both methods provide open-source code...
We develop a framework for rendering photographic images, taking into account display limitations, so as to optimize perceptual similarity between the rendered image and original scene. formulate this constrained optimization problem, in which we minimize measure of dissimilarity, Normalized Laplacian Pyramid Distance (NLPD), mimics early stage transformations human visual system. When images acquired with higher dynamic range than that display, find optimized solution boosts contrast...
Pre-trained convolutional neural networks (CNNs) are powerful off-the-shelf feature generators and have been shown to perform very well on a variety of tasks. Unfortunately, the generated features high dimensional expensive store: potentially hundreds thousands floats per example when processing videos. Traditional entropy based lossless compression methods little help as they do not yield desired level compression, while general purpose lossy energy compaction (e.g. PCA followed by...
Image compression using neural networks have reached or exceeded non-neural methods (such as JPEG, WebP, BPG). While these are state of the art in ratedistortion performance, computational feasibility models remains a challenge. We apply automatic network optimization techniques to reduce complexity popular architecture used image compression, analyze decoder execution runtime and explore trade-offs between two distortion metrics, rate-distortion performance run-time design research more...
In this paper, we investigate the use of linear, parametric models static and dynamic texture in context conventional transform coding images video. We propose a hybrid approach incorporating both texture-specific methods for improvement efficiency. Regarding (i.e., purely spatial) texture, show that Gaussian Markov random fields (GMRFs) can be used analysis/synthesis certain class texture. The properties model allow us to derive optimal classification, analysis, quantization synthesis. For...
Some forms of novel visual media enable the viewer to explore a 3D scene from essentially arbitrary viewpoints, by interpolating between discrete set original views. Compared 2D imagery, these types applications require much larger amounts storage space, which we seek reduce. Existing approaches for compressing scenes are often based on separation compression and rendering: each views is compressed using traditional image formats; receiver decompresses then performs rendering. We unify steps...
Efficient intra prediction is an important aspect of video coding with high compression efficiency. H.264/AVC applies directional from neighboring pixels on adjustable block size for local decorrelation. In this paper, we present extended scheme in the context that comprises two additional methods exploiting self-similar properties encoded texture. A new macroblock type implemented, allowing flexible selection available sub-partitions macroblock. Depending content sequence, substantial gains...
We develop a method for comparing hierarchical image representations in terms of their ability to explain perceptual sensitivity humans. Specifically, we utilize Fisher information establish model-derived prediction local perturbations an image. For given image, compute the eigenvectors matrix with largest and smallest eigenvalues, corresponding model-predicted most- least-noticeable distortions, respectively. human subjects, then measure amount each distortion that can be reliably detected...
We consider lossy compression of an information source when the decoder has lossless access to a correlated one. This setup, also known as Wyner-Ziv problem, is special case distributed coding. To this day, real-world applications problem have neither been fully developed nor heavily investigated. propose data-driven method based on machine learning that leverages universal function approximation capability artificial neural networks. find our network-based scheme re-discovers some...
We introduce a distortion measure for images, Wasserstein distortion, that simultaneously generalizes pixel-level fidelity on the one hand and realism or perceptual quality other. discuss its metric properties. Pairs of images are close under illustrate utility. In particular, we generate random have high to reference image in location smoothly transition an independent realization as moves away from this point. represents generalization synthesis prior work texture generation, models early...