- Advanced Vision and Imaging
- Advanced Image and Video Retrieval Techniques
- Generative Adversarial Networks and Image Synthesis
- 3D Shape Modeling and Analysis
- Computer Graphics and Visualization Techniques
- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Face recognition and analysis
- Robotics and Sensor-Based Localization
- Image Retrieval and Classification Techniques
- Advanced Image Processing Techniques
- Human Pose and Action Recognition
- Medical Image Segmentation Techniques
- Cell Image Analysis Techniques
- Video Surveillance and Tracking Methods
- Image Processing Techniques and Applications
- Image and Object Detection Techniques
- Digital Media Forensic Detection
- Image Enhancement Techniques
- Image and Signal Denoising Methods
- 3D Surveying and Cultural Heritage
- Digital Imaging for Blood Diseases
- Anomaly Detection Techniques and Applications
- Adversarial Robustness in Machine Learning
- Remote-Sensing Image Classification
Skolkovo Institute of Science and Technology
2013-2022
Samsung (Russia)
2019-2022
Yandex (Russia)
2012-2022
National Academy of Sciences of Armenia
2022
Samsung (United States)
2018-2021
Samsung (South Korea)
2020-2021
Samsung (United Kingdom)
2019
Institut national de recherche en informatique et en automatique
2016
Moscow Institute of Physics and Technology
2015-2016
Massachusetts Institute of Technology
2015
It this paper we revisit the fast stylization method introduced in Ulyanov et. al. (2016). We show how a small change architecture results significant qualitative improvement generated images. The is limited to swapping batch normalization with instance normalization, and apply latter both at training testing times. resulting can be used train high-performance architectures for real-time image generation. code will made available on github https://github.com/DmitryUlyanov/texture_nets. Full...
Top-performing deep architectures are trained on massive amounts of labeled data. In the absence data for a certain task, domain adaptation often provides an attractive option given that similar nature but from different (e.g. synthetic images) available. Here, we propose new approach to in can be large amount source and unlabeled target (no target-domain is necessary). As training progresses, promotes emergence "deep" features (i) discriminative main learning task (ii) invariant with...
Deep convolutional networks have become a popular tool for image generation and restoration. Generally, their excellent performance is imputed to ability learn realistic priors from large number of example images. In this paper, we show that, on the contrary, structure generator network sufficient capture great deal low-level statistics prior any learning. order do so, that randomly-initialized neural can be used as handcrafted with results in standard inverse problems such denoising,...
We present a new deep learning architecture (called Kdnetwork) that is designed for 3D model recognition tasks and works with unstructured point clouds. The performs multiplicative transformations shares parameters of these according to the subdivisions clouds imposed onto them by kdtrees. Unlike currently dominant convolutional architectures usually require rasterization on uniform twodimensional or three-dimensional grids, Kd-networks do not rely such grids in any way therefore avoid poor...
A large number of novel encodings for bag visual words models have been proposed in the past two years to improve on standard histogram quantized local features. Examples include locality-constrained linear encoding [23], improved Fisher [17], super vector [27], and kernel codebook [20]. While several authors reported very good results challenging PASCAL VOC classification data by means these new techniques, differences feature computation learning algorithms, missing details description...
The recent work of Gatys et al., who characterized the style an image by statistics convolutional neural network filters, ignited a renewed interest in texture generation and stylization problems. While their technique uses slow optimization process, recently several authors have proposed to learn generator networks that can produce similar outputs one quick forward pass. are promising, they still inferior visual quality diversity compared generation-by-optimization. In this work, we advance...
Several recent works have shown that image descriptors produced by deep convolutional neural networks provide state-of-the-art performance for classification and retrieval problems. It also has been the activations from layers can be interpreted as local features describing particular regions. These aggregated using aggregating methods developed (e.g. Fisher vectors), thus providing new powerful global descriptor. In this paper we investigate possible ways to aggregate produce compact...
Several recent works have shown how highly realistic human head images can be obtained by training convolutional neural networks to generate them. In order create a personalized talking model, these require on large dataset of single person. However, in many practical scenarios, such models need learned from few image views person, potentially even image. Here, we present system with few-shot capability. It performs lengthy meta-learning videos, and after that is able frame few- one-shot...
Abstract—The paper introduces Hough forests, which are random forests adapted to perform a generalized transform in an efficient way. Compared previous Hough-based systems such as implicit shape models, improve the performance of for object detection on categorical level. At same time, their flexibility permits extensions new domains tracking and action recognition. can be regarded task-adapted codebooks local appearance that allow fast supervised training matching at test time. They achieve...
Modern image inpainting systems, despite the significant progress, often struggle with large missing areas, complex geometric structures, and high-resolution images. We find that one of main reasons for is lack an effective receptive field in both network loss function. To alleviate this issue, we propose a new method called mask (LaMa). LaMa based on i) architecture uses fast Fourier convolutions (FFCs), which have image-wide field; ii) high perceptual loss; iii) training masks, unlocks...
We present a method for the detection of instances an object class, such as cars or pedestrians, in natural images. Similarly to some previous works, this is accomplished via generalized Hough transform, where detections individual parts cast probabilistic votes possible locations centroid whole object; hypotheses then correspond maxima image that accumulates from all parts. However, whereas methods detect using generative codebooks part appearances, we take more discriminative approach...
We revisit the idea of brain damage, i.e. pruning coefficients a neural network, and suggest how damage can be modified used to speedup convolutional layers in ConvNets. The approach uses fact that many efficient implementations reduce generalized convolutions matrix multiplications. suggested process prunes kernel tensor group-wise fashion. After such pruning, reduced multiplications thinned dense matrices, which leads speedup. investigate different ways add prunning learning process, show...
Gatys et al. recently demonstrated that deep networks can generate beautiful textures and stylized images from a single texture example. However, their methods requires slow memory-consuming optimization process. We propose here an alternative approach moves the computational burden to learning stage. Given example of texture, our trains compact feed-forward convolutional multiple samples same arbitrary size transfer artistic style given image any other image. The resulting are remarkably...
User-provided object bounding box is a simple and popular interaction paradigm considered by many existing interactive image segmentation frameworks. However, these frameworks tend to exploit the provided merely exclude its exterior from consideration sometimes initialize energy minimization. In this paper, we discuss how can be further used impose powerful topological prior, which prevents solution excessive shrinking ensures that user-provided bounds in sufficiently tight way. The prior...
We present two novel solutions for multi-view 3D human pose estimation based on new learnable triangulation methods that combine information from multiple 2D views. The first (baseline) solution is a basic differentiable algebraic with an addition of confidence weights estimated the input images. second method volumetric aggregation intermediate backbone feature maps. aggregated volume then refined via convolutions produce final joint heatmaps and allow implicit modelling prior. Crucially,...
We introduce a new compression scheme for high-dimensional vectors that approximates the using sums of M codewords coming from different codebooks. show proposed permits efficient distance and scalar product computations between compressed uncompressed vectors. further suggest vector encoding codebook learning algorithms can minimize coding error within scheme. In experiments, we demonstrate be used instead or together with quantization. Compared to quantization its optimized versions,...