Sergey Zagoruyko

ORCID: 0000-0001-9684-5240
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Domain Adaptation and Few-Shot Learning
  • Advanced Image and Video Retrieval Techniques
  • Autonomous Vehicle Technology and Safety
  • Adversarial Robustness in Machine Learning
  • Robotics and Sensor-Based Localization
  • Robotic Path Planning Algorithms
  • COVID-19 diagnosis using AI
  • Robot Manipulation and Learning
  • Artificial Intelligence in Games
  • Geophysical Methods and Applications
  • Advanced Vision and Imaging
  • Video Surveillance and Tracking Methods
  • Stochastic Gradient Optimization Techniques
  • Seismic Imaging and Inversion Techniques
  • Image Retrieval and Classification Techniques
  • Reinforcement Learning in Robotics
  • Marriage and Sexual Relationships
  • Multimodal Machine Learning Applications
  • Forensic Anthropology and Bioarchaeology Studies
  • Human Pose and Action Recognition
  • Face recognition and analysis
  • African Sexualities and LGBTQ+ Issues
  • Automated Road and Building Extraction
  • Image and Object Detection Techniques

Skolkovo Institute of Science and Technology
2024

Level-5 (Japan)
2023

Université Paris Sciences et Lettres
2019-2020

École Normale Supérieure - PSL
2019-2020

Centre National de la Recherche Scientifique
2019-2020

Menlo School
2020

Meta (United States)
2020

Institut national de recherche en informatique et en automatique
2020

École nationale des ponts et chaussées
2015-2018

Laboratoire d'Informatique Gaspard-Monge
2015-2018

Deep residual networks were shown to be able scale up thousands of layers and still have improving performance. However, each fraction a percent improved accuracy costs nearly doubling the number layers, so training very deep has problem diminishing feature reuse, which makes these slow train. To tackle problems, in this paper we conduct detailed experimental study on architecture ResNet blocks, based propose novel where decrease depth increase width networks. We call resulting network...

10.5244/c.30.87 preprint EN 2016-01-01

Deep residual networks were shown to be able scale up thousands of layers and still have improving performance. However, each fraction a percent improved accuracy costs nearly doubling the number layers, so training very deep has problem diminishing feature reuse, which makes these slow train. To tackle problems, in this paper we conduct detailed experimental study on architecture ResNet blocks, based propose novel where decrease depth increase width networks. We call resulting network...

10.48550/arxiv.1605.07146 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Attention plays a critical role in human visual experience. Furthermore, it has recently been demonstrated that attention can also play an important the context of applying artificial neural networks to variety tasks from fields such as computer vision and NLP. In this work we show that, by properly defining for convolutional networks, actually use type information order significantly improve performance student CNN network forcing mimic maps powerful teacher network. To end, propose several...

10.48550/arxiv.1612.03928 preprint EN other-oa arXiv (Cornell University) 2016-01-01

In this paper we show how to learn directly from image data (i.e., without resorting manually-designed features) a general similarity function for comparing patches, which is task of fundamental importance many computer vision problems. To encode such function, opt CNN-based model that trained account wide variety changes in appearance. end, explore and study multiple neural network architectures, are specifically adapted task. We an approach can significantly outperform the state-of-the-art...

10.1109/cvpr.2015.7299064 preprint EN 2015-06-01

The recent COCO object detection dataset presents several new challenges for detection. In particular, it contains objects at a broad range of scales, less prototypical images, and requires more precise localization. To address these challenges, we test three modifications to the standard Fast R-CNN detector: (1) skip connections that give detector access features multiple network layers, (2) foveal structure exploit context resolutions, (3) an integral loss function corresponding adjustment...

10.5244/c.30.15 article EN 2016-01-01

We use the scattering network as a generic and fixed initialization of first layers supervised hybrid deep network. show that early do not necessarily need to be learned, providing best results to-date with pre-defined representations while being competitive Deep CNNs. Using shallow cascade 1 × convolutions, which encodes coefficients correspond spatial windows very small sizes, permits obtain AlexNet accuracy on imagenet ILSVRC2012. demonstrate this local encoding explicitly learns...

10.1109/iccv.2017.599 preprint EN 2017-10-01

This work proposes an end-to-end multi-camera 3D multi-object tracking (MOT) framework. It emphasizes spatio-temporal continuity and integrates both past future reasoning for tracked objects. Thus, we name it "Past- and-Future Tracking" (PF-Track). Specifically, our method adopts the "tracking by attention" framework represents instances coherently over time with object queries. To explicitly use historical cues, "Past Reasoning" module learns to refine tracks enhance features...

10.1109/cvpr52729.2023.01719 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Scattering networks are a class of designed Convolutional Neural Networks (CNNs) with fixed weights. We argue they can serve as generic representations for modelling images. In particular, by working in scattering space, we achieve competitive results both supervised and unsupervised learning tasks, while making progress towards constructing more interpretable CNNs. For learning, demonstrate that the early layers CNNs do not necessarily need to be learned, replaced network instead. Indeed,...

10.1109/tpami.2018.2855738 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2018-07-19

Deep neural networks with skip-connections, such as ResNet, show excellent performance in various image classification benchmarks. It is though observed that the initial motivation behind them - training deeper does not actually hold true, and benefits come from increased capacity, rather than depth. Motivated by this, inspired we propose a simple Dirac weight parameterization, which allows us to train very deep plain without explicit achieve nearly same performance. This parameterization...

10.48550/arxiv.1706.00388 preprint EN other-oa arXiv (Cornell University) 2017-01-01

We present a new method that views object detection as direct set prediction problem. Our approach streamlines the pipeline, effectively removing need for many hand-designed components like non-maximum suppression procedure or anchor generation explicitly encode our prior knowledge about task. The main ingredients of framework, called DEtection TRansformer DETR, are set-based global loss forces unique predictions via bipartite matching, and transformer encoder-decoder architecture. Given...

10.48550/arxiv.2005.12872 preprint EN public-domain arXiv (Cornell University) 2020-01-01

We address the problem of visually guided rearrangement planning with many movable objects, i.e., finding a sequence actions to move set objects from an initial arrangement desired one, while relying on visual inputs coming RGB camera. To do so, we introduce complete pipeline two key contributions. First, efficient and scalable method, based Monte-Carlo Tree Search exploration strategy. demonstrate that because its good trade-off between exploitation our method (i) scales well number (ii)...

10.1109/lra.2020.2980984 article EN IEEE Robotics and Automation Letters 2020-03-17

The goal of autonomous vehicles is to navigate public roads safely and comfortably. To enforce safety, traditional planning approaches rely on handcrafted rules generate trajectories. Machine learning-based systems, the other hand, scale with data are able learn more complex behaviors. However, they often ignore that agents self-driving vehicle trajectory distributions can be leveraged improve safety. In this paper, we propose modeling a distribution over multiple future trajectories for...

10.1109/icra48891.2023.10160992 article EN 2023-05-29

Abstract. In this paper we evaluated deep-learning frameworks based on Convolutional Neural Networks for the accurate classification of multispectral remote sensing data. Certain state-of-the-art models have been tested publicly available SAT-4 and SAT-6 high resolution satellite datasets. particular, performed benchmark included AlexNet, AlexNet-small VGG which had trained applied to both datasets exploiting all spectral information. Deep Belief Networks, Autoencoders other semi-supervised...

10.5194/isprs-annals-iii-7-83-2016 article EN cc-by ISPRS annals of the photogrammetry, remote sensing and spatial information sciences 2016-06-07

We present a new shape prior formalism for the segmentation of rectified facade images. It combines simplicity split grammars with unprecedented expressive power: capability encoding simultaneous alignment in two dimensions, occlusions and irregular boundaries between elements. formulate task finding most likely image conforming to proposed form as MAP-MRF problem over 4-connected pixel grid, propose an efficient optimization algorithm solving it. Our method simultaneously segments visible...

10.1109/cvpr.2015.7298899 preprint EN 2015-06-01

Since DeepMind’s AlphaZero, Zero learning quickly became the state-of-the-art method for many board games. It can be improved using a fully convolutional structure (no connected layer). Using such an architecture plus global pooling, we create bots independent of size. The training made more robust by keeping track best checkpoints during and against them. these features, release Polygames, our framework learning, with its library games checkpoints. We won strong humans at game Hex in 19 ×...

10.3233/icg-200157 article EN ICGA Journal 2020-08-25

10.1016/j.cviu.2019.07.006 article EN publisher-specific-oa Computer Vision and Image Understanding 2019-08-01

The recent COCO object detection dataset presents several new challenges for detection. In particular, it contains objects at a broad range of scales, less prototypical images, and requires more precise localization. To address these challenges, we test three modifications to the standard Fast R-CNN detector: (1) skip connections that give detector access features multiple network layers, (2) foveal structure exploit context resolutions, (3) an integral loss function corresponding adjustment...

10.48550/arxiv.1604.02135 preprint EN other-oa arXiv (Cornell University) 2016-01-01
Coming Soon ...