Edgar Simo‐Serra

ORCID: 0000-0003-2544-8592
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Generative Adversarial Networks and Image Synthesis
  • Computer Graphics and Visualization Techniques
  • Advanced Vision and Imaging
  • Advanced Image and Video Retrieval Techniques
  • 3D Shape Modeling and Analysis
  • Multimodal Machine Learning Applications
  • Human Pose and Action Recognition
  • Image Retrieval and Classification Techniques
  • Advanced Image Processing Techniques
  • Advanced Neural Network Applications
  • Medical Image Segmentation Techniques
  • 3D Surveying and Cultural Heritage
  • Domain Adaptation and Few-Shot Learning
  • Image Enhancement Techniques
  • AI in cancer detection
  • Video Surveillance and Tracking Methods
  • Video Analysis and Summarization
  • Image Processing and 3D Reconstruction
  • Aesthetic Perception and Analysis
  • Handwritten Text Recognition Techniques
  • Human Motion and Animation
  • Face recognition and analysis
  • Robotic Mechanisms and Dynamics
  • Visual Attention and Saliency Detection
  • Image Processing Techniques and Applications

Waseda University
2015-2024

Consejo Superior de Investigaciones Científicas
2013-2015

Universitat Politècnica de Catalunya
2012-2015

Institut de Robòtica i Informàtica Industrial
2011-2015

We present a novel approach for image completion that results in images are both locally and globally consistent. With fully-convolutional neural network, we can complete of arbitrary resolutions by filling-in missing regions any shape. To train this network to be consistent, use global local context discriminators trained distinguish real from completed ones. The discriminator looks at the entire assess if it is coherent as whole, while only small area centered region ensure consistency...

10.1145/3072959.3073659 article EN ACM Transactions on Graphics 2017-07-20

Deep learning has revolutionalized image-level tasks such as classification, but patch-level tasks, correspondence, still rely on hand-crafted features, e.g. SIFT. In this paper we use Convolutional Neural Networks (CNNs) to learn discriminant patch representations and in particular train a Siamese network with pairs of (non-)corresponding patches. We deal the large number potential combination stochastic sampling training set an aggressive mining strategy biased towards patches that are...

10.1109/iccv.2015.22 preprint EN 2015-12-01

We present a novel technique to automatically colorize grayscale images that combines both global priors and local image features. Based on Convolutional Neural Networks, our deep network features fusion layer allows us elegantly merge information dependent small patches with computed using the entire image. The framework, including as well colorization model, is trained in an end-to-end fashion. Furthermore, architecture can process of any resolution, unlike most existing approaches based...

10.1145/2897824.2925974 article EN ACM Transactions on Graphics 2016-07-11

In this paper, we analyze the fashion of clothing a large social website. Our goal is to learn and predict how fashionable person looks on photograph suggest subtle improvements user could make improve her/his appeal. We propose Conditional Random Field model that jointly reasons about several fashionability factors such as type outfit garments wearing, user, photograph's setting (e.g., scenery behind user), score. Importantly, our able give rich feedback back conveying which or even she/he...

10.1109/cvpr.2015.7298688 article EN 2015-06-01

In this paper, we present a novel technique to simplify sketch drawings based on learning series of convolution operators. contrast existing approaches that require vector images as input, allow the more general and challenging input rough raster sketches such those obtained from scanning pencil sketches. We convert into simplified version which is then amendable for vectorization. This all done in fully automatic way without user intervention. Our model consists convolutional neural network...

10.1145/2897824.2925972 article EN ACM Transactions on Graphics 2016-07-11

We propose a novel approach for learning features from weakly-supervised data by joint ranking and classification. In order to exploit with weak labels, we jointly train feature extraction network loss classification cross-entropy loss. obtain high-quality compact discriminative few parameters, learned on relatively small datasets without additional annotations. This enables us tackle tasks specialized images not very similar the more generic ones in existing fully-supervised datasets. show...

10.1109/cvpr.2016.39 article EN 2016-06-01

We introduce a novel approach to automatically recover 3D human pose from single image. Most previous work follows pipelined approach: initially, set of 2D features such as edges, joints or silhouettes are detected in the image, and then these observations used infer pose. Solving two problems separately may lead erroneous poses when feature detector has performed poorly. In this paper, we address issue by jointly solving both detection inference problems. For purpose, propose Bayesian...

10.1109/cvpr.2013.466 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2013-06-01

Markerless 3D human pose detection from a single image is severely underconstrained problem because different poses can have similar projections. In order to handle this ambiguity, current approaches rely on prior shape models that only be correctly adjusted if 2D features are accurately detected. Unfortunately, although part detector algorithms shown promising results, they not yet accurate enough guarantee complete disambiguation of the inferred shape. paper, we introduce novel approach...

10.1109/cvpr.2012.6247988 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2012-06-01

We present an integral framework for training sketch simplification networks that convert challenging rough sketches into clean line drawings. Our approach augments a network with discriminator network, both jointly so the discerns whether drawing is real data or output of which, in turn, tries to fool it. This has two major advantages: first, because learns structure drawings, it encourages be more similar appearance sketches. Second, we can also train additional unsupervised data: by...

10.1145/3132703 article EN ACM Transactions on Graphics 2018-01-10

It is common in graphic design humans visually arrange various elements according to their intent and semantics. For example, a title text almost always appears on top of other document. In this work, we generate layouts that can flexibly incorporate such semantics, either specified implicitly or explicitly by user. We optimize using the latent space an off-the-shelf layout generation model, allowing our approach be complementary used with existing models. Our builds generative model based...

10.1145/3474085.3475497 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

Controllable layout generation aims at synthesizing plausible arrangement of element bounding boxes with optional constraints, such as type or position a specific element. In this work, we try to solve broad range tasks in single model that is based on discrete state-space diffusion models. Our model, named Lay-outDM, naturally handles the structured data representation and learns progressively infer noiseless from initial input, where corruption process by modality-wise diffusion. For...

10.1109/cvpr52729.2023.00980 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

In this paper we propose a novel framework for learning local image descriptors in discriminative manner. For purpose explore siamese architecture of Deep Convolutional Neural Networks (CNN), with Hinge embedding loss on the L2 distance between descriptors. Since uses pairs rather than single patches to train, there exist large number positive samples and an exponential negative samples. We space stochastic sampling training set, combination aggressive mining strategy over both which denote...

10.48550/arxiv.1412.6537 preprint EN other-oa arXiv (Cornell University) 2014-01-01

The remastering of vintage film comprises a diversity sub-tasks including super-resolution, noise removal, and contrast enhancement which aim to restore the deteriorated medium its original state. Additionally, due technical limitations time, most is either recorded in black white, or has low quality colors, for colorization becomes necessary. In this work, we propose single framework tackle entire task semi-interactively. Our work based on temporal convolutional neural networks with...

10.1145/3355089.3356570 article EN ACM Transactions on Graphics 2019-11-08

Flat filling is a critical step in digital artistic content creation with the objective of line arts flat colors. We present deep learning framework for user-guided art that can compute "influence areas" user color scribbles, i.e., areas where scribbles should propagate and influence. This explicitly controls such scribble influence artists to manipulate colors image details avoid leakage/contamination between simultaneously, leverages data-driven generation facilitate creation. based on...

10.1109/cvpr46437.2021.00976 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

We tackle the problem of multi-label classification fashion images, learning from noisy data with minimal human supervision. present a new dataset full body poses, each set 66 binary labels corresponding to information about garments worn in image obtained an automatic manner. As automatically-collected contain significant noise, we manually correct for small subset data, and use these further training evaluation. build upon recent approach that both cleans learns classify, introduce simple...

10.1109/iccvw.2017.265 article EN 2017-10-01

In this work, we perform an experimental analysis of the differences both how humans and machines see distinguish fashion styles. For purpose, propose expert-curated new dataset for style prediction, which consists 14 different styles each with roughly 1,000 images worn outfits. The dataset, a total 13,126 images, captures diversity complexity modern We extensive by benchmarking wide variety classification networks, also in-depth user study fashion-savvy fashion-naïve users. Our results...

10.1109/iccvw.2017.263 article EN 2017-10-01

Vector line art plays an important role in graphic design, however, it is tedious to manually create. We introduce a general framework produce drawings from wide variety of images, by learning mapping raster image space vector space. Our approach based on recurrent neural network that draws the lines one one. A differentiable rasterization module allows for training with only supervised data. use dynamic window around virtual pen while drawing lines, implemented proposed aligned cropping and...

10.1145/3450626.3459833 article EN ACM Transactions on Graphics 2021-07-19

We present an interactive approach for inking , which is the process of turning a pencil rough sketch into clean line drawing. The approach, we call Smart Inker consists several "smart" tools that intuitively react to user input, while guided by input sketch, efficiently and naturally connect lines, erase shading, fine-tune drawing output. Our data-driven: are based on fully convolutional networks, train exploit both edits inaccurate produce accurate drawings, allowing high-performance...

10.1145/3197517.3201370 article EN ACM Transactions on Graphics 2018-07-30

We propose a novel data-driven approach for automatically detecting and completing gaps in line drawings with Convolutional Neural Network. In the case of existing inpainting approaches natural images, masks indicating missing regions are generally required as input. Here, we show that have enough structures can be learned by CNN to allow automatic detection completion without any such Thus, our method find complete them user interaction. Furthermore, realistically conserves thickness...

10.1109/cvpr.2017.611 article EN 2017-07-01

We present an approach for the detection of buildings in multispectral satellite images. Unlike 3-channel RGB images, imagery contains additional channels corresponding to different wavelengths. Approaches that do not use all are unable fully exploit these images optimal performance. Furthermore, care must be taken due large bias classes, e.g., most Earth is covered water and thus it will dominant Our consists training a Convolutional Neural Network (CNN) from scratch classify image patches...

10.1109/icpr.2016.7900150 article EN 2016-12-01

10.1007/s11263-015-0805-1 article EN International Journal of Computer Vision 2015-02-13

Line art plays a fundamental role in illustration and design, allows for iteratively polishing designs. However, as they lack color, can have issues conveying final In this work, we propose an interactive colorization approach based on conditional generative adversarial network that takes both the line color hints inputs to produce high-quality colorized image. Our is U-net architecture with multi-discriminator framework. We Concatenation Spatial Attention module able generate more...

10.1109/cvprw53098.2021.00442 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021-06-01
Coming Soon ...