NFDI4DS | UHH-SEMS - Publication Details

Hugo Touvron

ORCID: 0000-0003-1678-392X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5032644449

Research Areas

Advanced Neural Network Applications
Domain Adaptation and Few-Shot Learning
Cell Image Analysis Techniques
Multimodal Machine Learning Applications
Advanced Image and Video Retrieval Techniques
CCD and CMOS Imaging Sensors
Image Processing Techniques and Applications
Generative Adversarial Networks and Image Synthesis
Brain Tumor Detection and Classification
COVID-19 diagnosis using AI
Advanced Image Processing Techniques
Digital Imaging for Blood Diseases
Visual Attention and Saliency Detection
Topic Modeling
Anomaly Detection Techniques and Applications
Currency Recognition and Detection
Image and Object Detection Techniques
Medical Image Segmentation Techniques
Natural Language Processing Techniques
Speech Recognition and Synthesis
Machine Learning in Healthcare
Human Pose and Action Recognition
Visual and Cognitive Learning Processes
Advanced Memory and Neural Computing
Model-Driven Software Engineering Techniques

Hong Kong Polytechnic University
2023

University of the Basque Country
2023

Nokia (United Kingdom)
2023

Sorbonne Université
2021-2023

Bangalore University
2023

Sorbonne University Abu Dhabi
2021-2023

Meta (Israel)
2020-2021

Université Paris Cité
2021

Université de Strasbourg
2019

Centre National de la Recherche Scientifique
2019

Emerging Properties in Self-Supervised Vision Transformers

OPENALEX - Publications

Mathilde Caron Hugo Touvron Ishan Misra Hervé Jeǵou Julien Mairal and 2 more

In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) [16] that stand out compared convolutional networks (convnets). Beyond the fact adapting methods architecture works particularly well, make following observations: first, ViT features contain explicit information about semantic segmentation of an image, which does not emerge as clearly with supervised ViTs, nor convnets. Second, these are also excellent k-NN classifiers, reaching 78.3%...

10.1109/iccv48922.2021.00951 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

LLaMA: Open and Efficient Foundation Language Models

OPENALEX - Publications

Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux and 9 more

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. train our on trillions tokens, and show that it is possible state-of-the-art using publicly available datasets exclusively, without resorting proprietary inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) most benchmarks, LLaMA-65B competitive with the best models, Chinchilla-70B PaLM-540B. release all research community.

10.48550/arxiv.2302.13971 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Llama 2: Open Foundation and Fine-Tuned Chat Models

OPENALEX - Publications

Hugo Touvron Louis Martin Kevin H. Stone Peter J. Albert Amjad Almahairi and 63 more

In this work, we develop and release Llama 2, a collection of pretrained fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 parameters. Our LLMs, called 2-Chat, are optimized for dialogue use cases. outperform open-source chat on most benchmarks tested, based our human evaluations helpfulness safety, may be suitable substitute closed-source models. We provide detailed description approach fine-tuning safety improvements 2-Chat order enable the community build work...

10.48550/arxiv.2307.09288 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Going deeper with Image Transformers

OPENALEX - Publications

Hugo Touvron Matthieu Cord Alexandre Sablayrolles Gabriel Synnaeve Hervé Jeǵou

Transformers have been recently adapted for large scale image classification, achieving high scores shaking up the long supremacy of convolutional neural networks. However optimization vision transformers has little studied so far. In this work, we build and optimize deeper transformer networks classification. particular, investigate interplay architecture such dedicated transformers. We make two changes that significantly improve accuracy deep This leads us to produce models whose...

10.1109/iccv48922.2021.00010 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference

OPENALEX - Publications

Ben Graham Alaaeldin El-Nouby Hugo Touvron Pierre Stock Armand Joulin and 2 more

We design a family of image classification architectures that optimize the trade-off between accuracy and efficiency in high-speed regime. Our work exploits recent findings attention-based architectures, which are competitive on highly parallel processing hardware. revisit principles from extensive literature convolutional neural networks to apply them transformers, particular activation maps with decreasing resolutions. also introduce attention bias, new way integrate positional information...

10.1109/iccv48922.2021.01204 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training

OPENALEX - Publications

Hugo Touvron Piotr Bojanowski Mathilde Caron Matthieu Cord Alaaeldin El-Nouby and 6 more

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) linear layer in which patches interact, independently and identically across channels, (ii) two-layer feed-forward channels interact per patch. When trained with modern training strategy using heavy data-augmentation optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. also train ResMLP models...

10.1109/tpami.2022.3206148 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-09-12

ConViT: improving vision transformers with soft convolutional inductive biases*

OPENALEX - Publications

Stéphane d’Ascoli Hugo Touvron Matthew L. Leavitt Ari S. Morcos Giulio Biroli and 1 more

Abstract Convolutional architectures have proven to be extremely successful for vision tasks. Their hard inductive biases enable sample-efficient learning, but come at the cost of a potentially lower performance ceiling. Vision transformers rely on more flexible self-attention layers, and recently outperformed CNNs image classification. However, they require costly pre-training large external datasets or distillation from pre-trained convolutional networks. In this paper, we ask following...

10.1088/1742-5468/ac9830 article EN Journal of Statistical Mechanics Theory and Experiment 2022-11-01

XCiT: Cross-Covariance Image Transformers

OPENALEX - Publications

Alaaeldin El-Nouby Hugo Touvron Mathilde Caron Piotr Bojanowski Matthijs Douze and 6 more

Following their success in natural language processing, transformers have recently shown much promise for computer vision. The self-attention operation underlying yields global interactions between all tokens ,i.e. words or image patches, and enables flexible modelling of data beyond the local convolutions. This flexibility, however, comes with a quadratic complexity time memory, hindering application to long sequences high-resolution images. We propose "transposed" version that operates...

10.48550/arxiv.2106.09681 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Code Llama: Open Foundation Models for Code

OPENALEX - Publications

Baptiste Rozière Jonas Gehring Fabian Gloeckle Sten Sootla Itai Gat and 20 more

We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support input contexts, and zero-shot instruction following ability programming tasks. provide multiple flavors to cover wide range applications: foundation (Code Llama), Python specializations - Python), instruction-following Instruct) with 7B, 13B, 34B 70B parameters each. All are trained sequences 16k tokens show improvements...

10.48550/arxiv.2308.12950 preprint EN cc-by arXiv (Cornell University) 2023-01-01

ResNet strikes back: An improved training procedure in timm

OPENALEX - Publications

Ross Wightman Hugo Touvron Hervé Jeǵou

The influential Residual Networks designed by He et al. remain the gold-standard architecture in numerous scientific publications. They typically serve as default studies, or baselines when new architectures are proposed. Yet there has been significant progress on best practices for training neural networks since inception of ResNet 2015. Novel optimization & data-augmentation have increased effectiveness recipes. In this paper, we re-evaluate performance vanilla ResNet-50 trained with a...

10.48550/arxiv.2110.00476 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Fixing the train-test resolution discrepancy

OPENALEX - Publications

Hugo Touvron Andrea Vedaldi Matthijs Douze Hervé Jeǵou

Data-augmentation is key to the training of neural networks for image classification. This paper first shows that existing augmentations induce a significant discrepancy between typical size objects seen by classifier at train and test time. We experimentally validate that, target resolution, using lower resolution offers better classification then propose simple yet effective efficient strategy optimize performance when resolutions differ. It involves only computationally cheap fine-tuning...

10.48550/arxiv.1906.06423 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Training data-efficient image transformers & distillation through attention

OPENALEX - Publications

Hugo Touvron Matthieu Cord Matthijs Douze Francisco Massa Alexandre Sablayrolles and 1 more

Recently, neural networks purely based on attention were shown to address image understanding tasks such as classification. However, these visual transformers are pre-trained with hundreds of millions images using an expensive infrastructure, thereby limiting their adoption. In this work, we produce a competitive convolution-free transformer by training Imagenet only. We train them single computer in less than 3 days. Our reference vision (86M parameters) achieves top-1 accuracy 83.1%...

10.48550/arxiv.2012.12877 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Are Large-scale Datasets Necessary for Self-Supervised Pre-training?

OPENALEX - Publications

Alaaeldin El-Nouby Gautier Izacard Hugo Touvron Ivan Laptev Hervé Jeǵou and 1 more

Pre-training models on large scale datasets, like ImageNet, is a standard practice in computer vision. This paradigm especially effective for tasks with small training sets, which high-capacity tend to overfit. In this work, we consider self-supervised pre-training scenario that only leverages the target task data. We Stanford Cars, Sketch or COCO, are order(s) of magnitude smaller than Imagenet. Our study shows denoising autoencoders, such as BEiT variant introduce paper, more robust type...

10.48550/arxiv.2112.10740 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Grafit: Learning fine-grained image representations with coarse labels

OPENALEX - Publications

Hugo Touvron Alexandre Sablayrolles Matthijs Douze Matthieu Cord Hervé Jeǵou

This paper tackles the problem of learning a finer representation than one provided by training labels. enables fine-grained category retrieval images in collection annotated with coarse labels only. Our network is learned nearest-neighbor classifier objective, and an instance loss inspired self-supervised learning. By jointly leveraging underlying latent space, it significantly improves accuracy category-level methods. strategy outperforms all competing methods for retrieving or classifying...

10.1109/iccv48922.2021.00091 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

OPENALEX - Publications

Ben Graham Alaaeldin El-Nouby Hugo Touvron Pierre Stock Armand Joulin and 2 more

10.48550/arxiv.2104.01136 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Fixing the train-test resolution discrepancy: FixEfficientNet

OPENALEX - Publications

Hugo Touvron Andrea Vedaldi Matthijs Douze Hervé Jeǵou

This paper provides an extensive analysis of the performance EfficientNet image classifiers with several recent training procedures, in particular one that corrects discrepancy between train and test images. The resulting network, called FixEfficientNet, significantly outperforms initial architecture same number parameters. For instance, our FixEfficientNet-B0 trained without additional data achieves 79.3% top-1 accuracy on ImageNet 5.3M is a +0.5% absolute improvement over Noisy student...

10.48550/arxiv.2003.08237 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Unbiased single-cell morphology with self-supervised vision transformers

OPENALEX - Publications

Michael Doron Théo Moutakanni Zitong Chen Nikita Moshkov Mathilde Caron and 4 more

Abstract Accurately quantifying cellular morphology at scale could substantially empower existing single-cell approaches. However, measuring cell remains an active field of research, which has inspired multiple computer vision algorithms over the years. Here, we show that DINO, a vision-transformer based, self-supervised algorithm, remarkable ability for learning rich representations without manual annotations or any other type supervision. We evaluate DINO on wide variety tasks across three...

10.1101/2023.06.16.545359 preprint EN cc-by-nd bioRxiv (Cold Spring Harbor Laboratory) 2023-06-18

Augmenting Convolutional networks with attention-based aggregation

OPENALEX - Publications

Hugo Touvron Matthieu Cord Alaaeldin El-Nouby Piotr Bojanowski Armand Joulin and 2 more

We show how to augment any convolutional network with an attention-based global map achieve non-local reasoning. replace the final average pooling by aggregation layer akin a single transformer block, that weights patches are involved in classification decision. plug this learned simplistic patch-based parametrized 2 parameters (width and depth). In contrast pyramidal design, architecture family maintains input patch resolution across all layers. It yields surprisingly competitive trade-offs...

10.48550/arxiv.2112.13692 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Coming Soon ...