NFDI4DS | UHH-SEMS - Publication Details

Yonglong Tian

ORCID: 0000-0002-6110-2145

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5085958820

Research Areas

Domain Adaptation and Few-Shot Learning
Multimodal Machine Learning Applications
Advanced Neural Network Applications
Advanced Image and Video Retrieval Techniques
Human Pose and Action Recognition
Video Surveillance and Tracking Methods
Generative Adversarial Networks and Image Synthesis
Anomaly Detection Techniques and Applications
Adversarial Robustness in Machine Learning
Image Processing and 3D Reconstruction
AI in cancer detection
Machine Learning and Data Classification
Computer Graphics and Visualization Techniques
COVID-19 diagnosis using AI
Industrial Vision Systems and Defect Detection
Tribology and Wear Analysis
CCD and CMOS Imaging Sensors
Hand Gesture Recognition Systems
3D Shape Modeling and Analysis
Complex Network Analysis Techniques
Advanced Vision and Imaging
Natural Language Processing Techniques
Advanced Memory and Neural Computing
Advanced Graph Neural Networks
Fiber-reinforced polymer composites

University of Electronic Science and Technology of China
2025

Mianyang Central Hospital
2025

Google (United States)
2024

Wuhan Institute of Technology
2024

Massachusetts Institute of Technology
2018-2022

Moscow Institute of Thermal Technology
2020-2021

IIT@MIT
2020

Donghua University
2018-2020

Chinese University of Hong Kong
2014-2016

Shenzhen Institutes of Advanced Technology
2015

Supervised Contrastive Learning

OPENALEX - Publications

Prannay Khosla Piotr Teterwak Chen Wang Aaron Sarna Yonglong Tian and 4 more

Contrastive learning applied to self-supervised representation has seen a resurgence in recent years, leading state of the art performance unsupervised training deep image models. Modern batch contrastive approaches subsume or significantly outperform traditional losses such as triplet, max-margin and N-pairs loss. In this work, we extend approach fully-supervised setting, allowing us effectively leverage label information. Clusters points belonging same class are pulled together embedding...

10.48550/arxiv.2004.11362 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Representation Learning on Graphs with Jumping Knowledge Networks

OPENALEX - Publications

Keyulu Xu Chengtao Li Yonglong Tian Tomohiro Sonobe Ken‐ichi Kawarabayashi and 1 more

Recent deep learning approaches for representation on graphs follow a neighborhood aggregation procedure. We analyze some important properties of these models, and propose strategy to overcome those. In particular, the range "neighboring" nodes that node's draws from strongly depends graph structure, analogous spread random walk. To adapt local tasks, we explore an architecture -- jumping knowledge (JK) networks flexibly leverages, each node, different ranges enable better structure-aware...

10.48550/arxiv.1806.03536 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Contrastive Multiview Coding

OPENALEX - Publications

Yonglong Tian Dilip Krishnan Phillip Isola

Humans view the world through many sensory channels, e.g., long-wavelength light channel, viewed by left eye, or high-frequency vibrations heard right ear. Each is noisy and incomplete, but important factors, such as physics, geometry, semantics, tend to be shared between all views (e.g., a "dog" can seen, heard, felt). We investigate classic hypothesis that powerful representation one models view-invariant factors. study this under framework of multiview contrastive learning, where we learn...

10.48550/arxiv.1906.05849 preprint EN cc-by-nc-sa arXiv (Cornell University) 2019-01-01

Through-Wall Human Pose Estimation Using Radio Signals

OPENALEX - Publications

M. Zhao Tianhong Li Mohammad Abu Alsheikh Yonglong Tian Hang Zhao and 2 more

This paper demonstrates accurate human pose estimation through walls and occlusions. We leverage the fact that wireless signals in WiFi frequencies traverse reflect off body. introduce a deep neural network approach parses such radio to estimate 2D poses. Since humans cannot annotate signals, we use state-of-the-art vision model provide cross-modal supervision. Specifically, during training system uses synchronized visual inputs, extracts information from stream, it guide process. Once...

10.1109/cvpr.2018.00768 article EN 2018-06-01

Contrastive Representation Distillation

OPENALEX - Publications

Yonglong Tian Dilip Krishnan Phillip Isola

Often we wish to transfer representational knowledge from one neural network another. Examples include distilling a large into smaller one, transferring sensory modality second, or ensembling collection of models single estimator. Knowledge distillation, the standard approach these problems, minimizes KL divergence between probabilistic outputs teacher and student network. We demonstrate that this objective ignores important structural This motivates an alternative by which train capture...

10.48550/arxiv.1910.10699 preprint EN cc-by-nc-sa arXiv (Cornell University) 2019-01-01

Deep Learning Strong Parts for Pedestrian Detection

OPENALEX - Publications

Yonglong Tian Ping Luo Xiaogang Wang Xiaoou Tang

Recent advances in pedestrian detection are attained by transferring the learned features of Convolutional Neural Network (ConvNet) to pedestrians. This ConvNet is typically pre-trained with massive general object categories (e.g. ImageNet). Although these able handle variations such as poses, viewpoints, and lightings, they may fail when images complex occlusions present. Occlusion handling one most important problem detection. Unlike previous deep models that directly a single detector for...

10.1109/iccv.2015.221 article EN 2015-12-01

What Makes for Good Views for Contrastive Learning?

OPENALEX - Publications

Yonglong Tian Chen Sun Ben Poole Dilip Krishnan Cordelia Schmid and 1 more

Contrastive learning between multiple views of the data has recently achieved state art performance in field self-supervised representation learning. Despite its success, influence different view choices been less studied. In this paper, we use theoretical and empirical analysis to better understand importance selection, argue that should reduce mutual information (MI) while keeping task-relevant intact. To verify hypothesis, devise unsupervised semi-supervised frameworks learn effective by...

10.48550/arxiv.2005.10243 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Pedestrian detection aided by deep learning semantic tasks

OPENALEX - Publications

Yonglong Tian Ping Luo Xiaogang Wang Xiaoou Tang

Deep learning methods have achieved great successes in pedestrian detection, owing to its ability learn discriminative features from raw pixels. However, they treat detection as a single binary classification task, which may confuse positive with hard negative samples (Fig.1 (a)). To address this ambiguity, work jointly optimize semantic tasks, including attributes (e.g. `carrying backpack') and scene `vehicle', `tree', `horizontal'). Rather than expensively annotating attributes, we...

10.1109/cvpr.2015.7299143 preprint EN 2015-06-01

DeepID-Net: Deformable deep convolutional neural networks for object detection

OPENALEX - Publications

Wanli Ouyang Xiaogang Wang Xingyu Zeng Shi Qiu Ping Luo and 6 more

In this paper, we propose deformable deep convolutional neural networks for generic object detection.This new learning detection framework has innovations in multiple aspects.In the proposed architecture, a deformation constrained pooling (def-pooling) layer models of parts with geometric constraint and penalty.A pre-training strategy is to learn feature representations more suitable task good generalization capability.By changing net structures, training strategies, adding removing some key...

10.1109/cvpr.2015.7298854 preprint EN 2015-06-01

RF-based 3D skeletons

OPENALEX - Publications

M. Zhao Yonglong Tian Hang Zhao Mohammad Abu Alsheikh Tianhong Li and 4 more

This paper introduces RF-Pose3D, the first system that infers 3D human skeletons from RF signals. It requires no sensors on body, and works with multiple people across walls occlusions. Further, it generates dynamic follow as they move, walk or sit. As such, RF-Pose3D provides a significant leap in RF-based sensing enables new applications gaming, healthcare, smart homes.

10.1145/3230543.3230579 article EN 2018-08-07

Switchable Deep Network for Pedestrian Detection

OPENALEX - Publications

Ping Luo Yonglong Tian Xiaogang Wang Xiaoou Tang

In this paper, we propose a Switchable Deep Network (SDN) for pedestrian detection. The SDN automatically learns hierarchical features, salience maps, and mixture representations of different body parts. Pedestrian detection faces the challenges background clutter large variations appearance due to pose viewpoint changes other factors. One our key contributions is Restricted Boltzmann Machine (SRBM) explicitly model complex visual at multiple levels. At feature levels, it estimates saliency...

10.1109/cvpr.2014.120 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2014-06-01

DeepID-Net: Object Detection with Deformable Part Based Convolutional Neural Networks

OPENALEX - Publications

Wanli Ouyang Xingyu Zeng Xiaogang Wang Shi Qiu Ping Luo and 9 more

In this paper, we propose deformable deep convolutional neural networks for generic object detection. This new learning detection framework has innovations in multiple aspects. the proposed architecture, a deformation constrained pooling (def-pooling) layer models of parts with geometric constraint and penalty. A pre-training strategy is to learn feature representations more suitable task good generalization capability. By changing net structures, training strategies, adding removing some...

10.1109/tpami.2016.2587642 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2016-07-07

RF-Based Fall Monitoring Using Convolutional Neural Networks

OPENALEX - Publications

Yonglong Tian Guang-He Lee Hao He Chen-Yu Hsu Dina Katabi

Falls are the top reason for fatal and non-fatal injuries among seniors. Existing solutions based on wearable fall-alert sensors, but medical research has shown that they ineffective, mostly because seniors do not wear them. These revelations have led to new passive sensors infer falls by analyzing Radio Frequency (RF) signals in homes. Seniors can go about their lives as usual without need any device. While monitoring made major advances, current approaches still cannot deal with...

10.1145/3264947 article EN Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies 2018-09-18

DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection

OPENALEX - Publications

Wanli Ouyang Ping Luo Xingyu Zeng Shi Qiu Yonglong Tian and 10 more

In this paper, we propose multi-stage and deformable deep convolutional neural networks for object detection. This new learning detection diagram has innovations in multiple aspects. the proposed architecture, a deformation constrained pooling (def-pooling) layer models of parts with geometric constraint penalty. With training strategy, classifiers are jointly optimized to process samples at different difficulty levels. A pre-training strategy is learn feature representations more suitable...

10.48550/arxiv.1409.3505 preprint EN other-oa arXiv (Cornell University) 2014-01-01

DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection

OPENALEX - Publications

Wanli Ouyang Xiaogang Wang Xingyu Zeng Shi Qiu Ping Luo and 6 more

In this paper, we propose deformable deep convolutional neural networks for generic object detection. This new learning detection framework has innovations in multiple aspects. the proposed architecture, a deformation constrained pooling (def-pooling) layer models of parts with geometric constraint and penalty. A pre-training strategy is to learn feature representations more suitable task good generalization capability. By changing net structures, training strategies, adding removing some...

10.48550/arxiv.1412.5661 preprint EN other-oa arXiv (Cornell University) 2014-01-01

Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?

OPENALEX - Publications

Yonglong Tian Yue Wang Dilip Krishnan Joshua B. Tenenbaum Phillip Isola

The focus of recent meta-learning research has been on the development learning algorithms that can quickly adapt to test time tasks with limited data and low computational cost. Few-shot is widely used as one standard benchmarks in meta-learning. In this work, we show a simple baseline: supervised or self-supervised representation meta-training set, followed by training linear classifier top representation, outperforms state-of-the-art few-shot methods. An additional boost be achieved...

10.48550/arxiv.2003.11539 preprint EN cc-by arXiv (Cornell University) 2020-01-01

Co-advise: Cross Inductive Bias Distillation

OPENALEX - Publications

Sucheng Ren Zhengqi Gao Tianyu Hua Zihui Xue Yonglong Tian and 2 more

The inductive bias of vision transformers is more relaxed that cannot work well with insufficient data. Knowledge distillation thus introduced to assist the training transformers. Unlike previous works, where merely heavy convolution-based teachers are provided, in this paper, we delve into influence models biases knowledge (e.g., convolution and involution). Our key observation teacher accuracy not dominant reason for student accuracy, but important. We demonstrate lightweight different...

10.1109/cvpr52688.2022.01627 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Improving CLIP Training with Language Rewrites

OPENALEX - Publications

Lijie Fan Dilip Krishnan Phillip Isola Dina Katabi Yonglong Tian

Contrastive Language-Image Pre-training (CLIP) stands as one of the most effective and scalable methods for training transferable vision models using paired image text data. CLIP are trained contrastive loss, which typically relies on data augmentations to prevent overfitting shortcuts. However, in paradigm, exclusively applied inputs, while language inputs remain unchanged throughout entire process, limiting exposure diverse texts same image. In this paper, we introduce Language augmented...

10.48550/arxiv.2305.20088 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Coming Soon ...