NFDI4DS | UHH-SEMS - Publication Details

Jifeng Dai

ORCID: 0000-0002-6785-0785

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5026944066

Research Areas

Advanced Neural Network Applications
Multimodal Machine Learning Applications
Domain Adaptation and Few-Shot Learning
Advanced Image and Video Retrieval Techniques
Topic Modeling
Advanced Vision and Imaging
Natural Language Processing Techniques
Human Pose and Action Recognition
Robotics and Sensor-Based Localization
Image Retrieval and Classification Techniques
Advanced Image Processing Techniques
Visual Attention and Saliency Detection
Generative Adversarial Networks and Image Synthesis
Image Enhancement Techniques
Machine Learning and Data Classification
CCD and CMOS Imaging Sensors
COVID-19 diagnosis using AI
Anomaly Detection Techniques and Applications
Video Analysis and Summarization
Cardiovascular Health and Disease Prevention
Video Surveillance and Tracking Methods
Adversarial Robustness in Machine Learning
Reinforcement Learning in Robotics
Evaluation Methods in Various Fields
Biometric Identification and Security

Shanghai Artificial Intelligence Laboratory
2022-2024

Tsinghua University
2010-2024

Kunming University of Science and Technology
2022-2024

Beijing Academy of Artificial Intelligence
2022-2024

ShangHai JiAi Genetics & IVF Institute
2023

InternetLab
2023

Group Sense (China)
2020-2022

Shanghai Jiao Tong University
2022

Sensetime (China)
2020-2021

Chinese University of Hong Kong
2020

Deformable Convolutional Networks

OPENALEX - Publications

Jifeng Dai Haozhi Qi Yuwen Xiong Yi Li Guodong Zhang and 2 more

Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due the fixed structures in their building modules. In this work, we introduce two new modules enhance transformation modeling capability of CNNs, namely, deformable convolution and RoI pooling. Both based on idea augmenting spatial sampling locations with additional offsets learning from target tasks, without supervision. The can readily replace plain counterparts existing CNNs be easily trained...

10.1109/iccv.2017.89 article EN 2017-10-01

R-FCN: Object Detection via Region-based Fully Convolutional Networks

OPENALEX - Publications

Jifeng Dai Yi Li Kaiming He Jian Sun

We present region-based, fully convolutional networks for accurate and efficient object detection. In contrast to previous region-based detectors such as Fast/Faster R-CNN that apply a costly per-region subnetwork hundreds of times, our detector is with almost all computation shared on the entire image. To achieve this goal, we propose position-sensitive score maps address dilemma between translation-invariance in image classification translation-variance Our method can thus naturally adopt...

10.48550/arxiv.1605.06409 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Deformable ConvNets V2: More Deformable, Better Results

OPENALEX - Publications

Xizhou Zhu Han Hu Stephen Lin Jifeng Dai

The superior performance of Deformable Convolutional Networks arises from its ability to adapt the geometric variations objects. Through an examination adaptive behavior, we observe that while spatial support for neural features conforms more closely than regular ConvNets object structure, this may nevertheless extend well beyond region interest, causing be influenced by irrelevant image content. To address problem, present a reformulation improves focus on pertinent regions, through...

10.1109/cvpr.2019.00953 preprint EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

MMDetection: Open MMLab Detection Toolbox and Benchmark

OPENALEX - Publications

Kai Chen Jiaqi Wang Jiangmiao Pang Yuhang Cao Yu Xiong and 20 more

We present MMDetection, an object detection toolbox that contains a rich set of and instance segmentation methods as well related components modules. The started from codebase MMDet team who won the track COCO Challenge 2018. It gradually evolves into unified platform covers many popular contemporary not only includes training inference codes, but also provides weights for more than 200 network models. believe this is by far most complete toolbox. In paper, we introduce various features...

10.48550/arxiv.1906.07155 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Relation Networks for Object Detection

OPENALEX - Publications

Han Hu Jiayuan Gu Zheng Zhang Jifeng Dai Yichen Wei

Although it is well believed for years that modeling relations between objects would help object recognition, there has not been evidence the idea working in deep learning era. All state-of-the-art detection systems still rely on recognizing instances individually, without exploiting their during learning. This work proposes an relation module. It processes a set of simultaneously through interaction appearance feature and geometry, thus allowing relations. lightweight in-place. does require...

10.1109/cvpr.2018.00378 article EN 2018-06-01

Instance-Aware Semantic Segmentation via Multi-task Network Cascades

OPENALEX - Publications

Jifeng Dai Kaiming He Jian Sun

Semantic segmentation research has recently witnessed rapid progress, but many leading methods are unable to identify object instances. In this paper, we present Multitask Network Cascades for instance-aware semantic segmentation. Our model consists of three networks, respectively differentiating instances, estimating masks, and categorizing objects. These networks form a cascaded structure, designed share their convolutional features. We develop an algorithm the nontrivial end-to-end...

10.1109/cvpr.2016.343 article EN 2016-06-01

Fully Convolutional Instance-Aware Semantic Segmentation

OPENALEX - Publications

Yi Li Haozhi Qi Jifeng Dai Xiangyang Ji Yichen Wei

We present the first fully convolutional end-to-end solution for instance-aware semantic segmentation task. It inherits all merits of FCNs [29] and instance mask proposal [5]. performs prediction classification jointly. The underlying representation is shared between two sub-tasks, as well regions interest. network architecture highly integrated efficient. achieves state-of-the-art performance in both accuracy efficiency. wins COCO 2016 competition by a large margin. Code would be released...

10.1109/cvpr.2017.472 preprint EN 2017-07-01

ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation

OPENALEX - Publications

Di Lin Jifeng Dai Jiaya Jia Kaiming He Jian Sun

Large-scale data is of crucial importance for learning semantic segmentation models, but annotating per-pixel masks a tedious and inefficient procedure. We note that the topic interactive image segmentation, scribbles are very widely used in academic research commercial software, recognized as one most userfriendly ways interacting. In this paper, we propose to use annotate images, develop an algorithm train convolutional networks supervised by scribbles. Our based on graphical model jointly...

10.1109/cvpr.2016.344 preprint EN 2016-06-01

BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation

OPENALEX - Publications

Jifeng Dai Kaiming He Jian Sun

Recent leading approaches to semantic segmentation rely on deep convolutional networks trained with human-annotated, pixel-level masks. Such pixel-accurate supervision demands expensive labeling effort and limits the performance of that usually benefit from more training data. In this paper, we propose a method achieves competitive accuracy but only requires easily obtained bounding box annotations. The basic idea is iterate between automatically generating region proposals networks. These...

10.1109/iccv.2015.191 preprint EN 2015-12-01

Deformable DETR: Deformable Transformers for End-to-End Object Detection

OPENALEX - Publications

Xizhou Zhu Weijie Su Lewei Lu Bin Li Xiaogang Wang and 1 more

DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance. However, it suffers from slow convergence and limited feature spatial resolution, due limitation of Transformer attention modules processing image maps. To mitigate these issues, we Deformable DETR, whose only attend a small set key sampling points around reference. can achieve better performance than (especially on objects) with 10 times less...

10.48550/arxiv.2010.04159 preprint EN other-oa arXiv (Cornell University) 2020-01-01

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

OPENALEX - Publications

Weijie Su Xizhou Zhu Yue Cao Bin Li Lewei Lu and 2 more

We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT short). VL-BERT adopts the simple yet powerful Transformer model as backbone, and extends it to take both visual linguistic embedded features input. In it, each element of input is either word from sentence, or region-of-interest (RoI) image. It designed fit most downstream tasks. To better exploit representation, we pre-train on massive-scale Conceptual Captions...

10.48550/arxiv.1908.08530 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Deep Feature Flow for Video Recognition

OPENALEX - Publications

Xizhou Zhu Yuwen Xiong Jifeng Dai Lu Yuan Yichen Wei

Deep convolutional neutral networks have achieved great success on image recognition tasks. Yet, it is non-trivial to transfer the state-of-the-art videos as per-frame evaluation too slow and unaffordable. We present deep feature flow, a fast accurate framework for video recognition. It runs expensive sub-network only sparse key frames propagates their maps other via flow field. achieves significant speedup computation relatively fast. The end-to-end training of whole architecture...

10.1109/cvpr.2017.441 article EN 2017-07-01

Flow-Guided Feature Aggregation for Video Object Detection

OPENALEX - Publications

Xizhou Zhu Yujie Wang Jifeng Dai Lu Yuan Yichen Wei

Extending state-of-the-art object detectors from image to video is challenging. The accuracy of detection suffers degenerated appearances in videos, e.g., motion blur, defocus, rare poses, etc. Existing work attempts exploit temporal information on box level, but such methods are not trained end-to-end. We present flow-guided feature aggregation, an accurate and end-to-end learning framework for detection. It leverages coherence level instead. improves the per-frame features by aggregation...

10.1109/iccv.2017.52 article EN 2017-10-01

Convolutional feature masking for joint object and stuff segmentation

OPENALEX - Publications

Jifeng Dai Kaiming He Jian Sun

The topic of semantic segmentation has witnessed considerable progress due to the powerful features learned by convolutional neural networks (CNNs) [13]. current leading approaches for exploit shape information extracting CNN from masked image regions. This strategy introduces artificial boundaries on images and may impact quality extracted features. Besides, operations raw domain require compute thousands a single image, which is time-consuming. In this paper, we propose via masking...

10.1109/cvpr.2015.7299025 preprint EN 2015-06-01

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

OPENALEX - Publications

Wenhai Wang Jifeng Dai Zhe Chen Zhenhang Huang Zhiqi Li and 7 more

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, models based on convolutional neural networks (CNNs) are still an early state. This work presents a new CNN-based foundation model, termed InternImage, which can obtain gain from increasing parameters and training data like ViTs. Different CNNs that focus large dense kernels, InternImage takes deformable convolution as core operator, so our model not only has effective receptive field required for...

10.1109/cvpr52729.2023.01385 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

OPENALEX - Publications

Wenguan Wang Tianfei Zhou Fisher Yu Jifeng Dai Ender Konukoğlu and 1 more

Current semantic segmentation methods focus only on mining "local" context, i.e., dependencies between pixels within individual images, by context-aggregation modules (e.g., dilated convolution, neural attention) or structure-aware optimization criteria IoU-like loss). However, they ignore "global" context of the training data, rich relations across different images. Inspired recent advance in unsupervised contrastive representation learning, we propose a pixel-wise algorithm for fully...

10.1109/iccv48922.2021.00721 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Deformable Convolutional Networks

OPENALEX - Publications

Jifeng Dai Haozhi Qi Yuwen Xiong Yi Li Guodong Zhang and 2 more

Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due the fixed structures in its building modules. In this work, we introduce two new modules enhance transformation modeling capacity of CNNs, namely, deformable convolution and RoI pooling. Both based on idea augmenting spatial sampling locations with additional offsets learning from target tasks, without supervision. The can readily replace their plain counterparts existing CNNs be easily trained...

10.48550/arxiv.1703.06211 preprint EN other-oa arXiv (Cornell University) 2017-01-01

An Empirical Study of Spatial Attention Mechanisms in Deep Networks

OPENALEX - Publications

Xizhou Zhu Dazhi Cheng Zheng Zhang Stephen Lin Jifeng Dai

Attention mechanisms have become a popular component in deep neural networks, yet there has been little examination of how different influencing factors and methods for computing attention from these affect performance. Toward better general understanding mechanisms, we present an empirical study that ablates various spatial elements within generalized formulation, encompassing the dominant Transformer as well prevalent deformable convolution dynamic modules. Conducted on variety...

10.1109/iccv.2019.00679 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Towards High Performance Video Object Detection

OPENALEX - Publications

Xizhou Zhu Jifeng Dai Lu Yuan Yichen Wei

There has been significant progresses for image object detection in recent years. Nevertheless, video received little attention, although it is more challenging and important practical scenarios. Built upon the works [37, 36], this work proposes a unified approach based on principle of multi-frame end-to-end learning features cross-frame motion. Our extends prior with three new techniques steadily pushes forward performance envelope (speed-accuracy tradeoff), towards high detection.

10.1109/cvpr.2018.00753 preprint EN 2018-06-01

Coming Soon ...