Jifeng Dai

ORCID: 0000-0002-6785-0785
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Multimodal Machine Learning Applications
  • Domain Adaptation and Few-Shot Learning
  • Advanced Image and Video Retrieval Techniques
  • Topic Modeling
  • Advanced Vision and Imaging
  • Natural Language Processing Techniques
  • Human Pose and Action Recognition
  • Robotics and Sensor-Based Localization
  • Image Retrieval and Classification Techniques
  • Advanced Image Processing Techniques
  • Visual Attention and Saliency Detection
  • Generative Adversarial Networks and Image Synthesis
  • Image Enhancement Techniques
  • Machine Learning and Data Classification
  • CCD and CMOS Imaging Sensors
  • COVID-19 diagnosis using AI
  • Anomaly Detection Techniques and Applications
  • Video Analysis and Summarization
  • Cardiovascular Health and Disease Prevention
  • Video Surveillance and Tracking Methods
  • Adversarial Robustness in Machine Learning
  • Reinforcement Learning in Robotics
  • Evaluation Methods in Various Fields
  • Biometric Identification and Security

Shanghai Artificial Intelligence Laboratory
2022-2024

Tsinghua University
2010-2024

Kunming University of Science and Technology
2022-2024

Beijing Academy of Artificial Intelligence
2022-2024

ShangHai JiAi Genetics & IVF Institute
2023

InternetLab
2023

Group Sense (China)
2020-2022

Shanghai Jiao Tong University
2022

Sensetime (China)
2020-2021

Chinese University of Hong Kong
2020

Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due the fixed structures in their building modules. In this work, we introduce two new modules enhance transformation modeling capability of CNNs, namely, deformable convolution and RoI pooling. Both based on idea augmenting spatial sampling locations with additional offsets learning from target tasks, without supervision. The can readily replace plain counterparts existing CNNs be easily trained...

10.1109/iccv.2017.89 article EN 2017-10-01

We present region-based, fully convolutional networks for accurate and efficient object detection. In contrast to previous region-based detectors such as Fast/Faster R-CNN that apply a costly per-region subnetwork hundreds of times, our detector is with almost all computation shared on the entire image. To achieve this goal, we propose position-sensitive score maps address dilemma between translation-invariance in image classification translation-variance Our method can thus naturally adopt...

10.48550/arxiv.1605.06409 preprint EN other-oa arXiv (Cornell University) 2016-01-01

The superior performance of Deformable Convolutional Networks arises from its ability to adapt the geometric variations objects. Through an examination adaptive behavior, we observe that while spatial support for neural features conforms more closely than regular ConvNets object structure, this may nevertheless extend well beyond region interest, causing be influenced by irrelevant image content. To address problem, present a reformulation improves focus on pertinent regions, through...

10.1109/cvpr.2019.00953 preprint EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

We present MMDetection, an object detection toolbox that contains a rich set of and instance segmentation methods as well related components modules. The started from codebase MMDet team who won the track COCO Challenge 2018. It gradually evolves into unified platform covers many popular contemporary not only includes training inference codes, but also provides weights for more than 200 network models. believe this is by far most complete toolbox. In paper, we introduce various features...

10.48550/arxiv.1906.07155 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Although it is well believed for years that modeling relations between objects would help object recognition, there has not been evidence the idea working in deep learning era. All state-of-the-art detection systems still rely on recognizing instances individually, without exploiting their during learning. This work proposes an relation module. It processes a set of simultaneously through interaction appearance feature and geometry, thus allowing relations. lightweight in-place. does require...

10.1109/cvpr.2018.00378 article EN 2018-06-01

Semantic segmentation research has recently witnessed rapid progress, but many leading methods are unable to identify object instances. In this paper, we present Multitask Network Cascades for instance-aware semantic segmentation. Our model consists of three networks, respectively differentiating instances, estimating masks, and categorizing objects. These networks form a cascaded structure, designed share their convolutional features. We develop an algorithm the nontrivial end-to-end...

10.1109/cvpr.2016.343 article EN 2016-06-01

We present the first fully convolutional end-to-end solution for instance-aware semantic segmentation task. It inherits all merits of FCNs [29] and instance mask proposal [5]. performs prediction classification jointly. The underlying representation is shared between two sub-tasks, as well regions interest. network architecture highly integrated efficient. achieves state-of-the-art performance in both accuracy efficiency. wins COCO 2016 competition by a large margin. Code would be released...

10.1109/cvpr.2017.472 preprint EN 2017-07-01

Large-scale data is of crucial importance for learning semantic segmentation models, but annotating per-pixel masks a tedious and inefficient procedure. We note that the topic interactive image segmentation, scribbles are very widely used in academic research commercial software, recognized as one most userfriendly ways interacting. In this paper, we propose to use annotate images, develop an algorithm train convolutional networks supervised by scribbles. Our based on graphical model jointly...

10.1109/cvpr.2016.344 preprint EN 2016-06-01

Recent leading approaches to semantic segmentation rely on deep convolutional networks trained with human-annotated, pixel-level masks. Such pixel-accurate supervision demands expensive labeling effort and limits the performance of that usually benefit from more training data. In this paper, we propose a method achieves competitive accuracy but only requires easily obtained bounding box annotations. The basic idea is iterate between automatically generating region proposals networks. These...

10.1109/iccv.2015.191 preprint EN 2015-12-01

DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance. However, it suffers from slow convergence and limited feature spatial resolution, due limitation of Transformer attention modules processing image maps. To mitigate these issues, we Deformable DETR, whose only attend a small set key sampling points around reference. can achieve better performance than (especially on objects) with 10 times less...

10.48550/arxiv.2010.04159 preprint EN other-oa arXiv (Cornell University) 2020-01-01

We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT short). VL-BERT adopts the simple yet powerful Transformer model as backbone, and extends it to take both visual linguistic embedded features input. In it, each element of input is either word from sentence, or region-of-interest (RoI) image. It designed fit most downstream tasks. To better exploit representation, we pre-train on massive-scale Conceptual Captions...

10.48550/arxiv.1908.08530 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Deep convolutional neutral networks have achieved great success on image recognition tasks. Yet, it is non-trivial to transfer the state-of-the-art videos as per-frame evaluation too slow and unaffordable. We present deep feature flow, a fast accurate framework for video recognition. It runs expensive sub-network only sparse key frames propagates their maps other via flow field. achieves significant speedup computation relatively fast. The end-to-end training of whole architecture...

10.1109/cvpr.2017.441 article EN 2017-07-01

Extending state-of-the-art object detectors from image to video is challenging. The accuracy of detection suffers degenerated appearances in videos, e.g., motion blur, defocus, rare poses, etc. Existing work attempts exploit temporal information on box level, but such methods are not trained end-to-end. We present flow-guided feature aggregation, an accurate and end-to-end learning framework for detection. It leverages coherence level instead. improves the per-frame features by aggregation...

10.1109/iccv.2017.52 article EN 2017-10-01

The topic of semantic segmentation has witnessed considerable progress due to the powerful features learned by convolutional neural networks (CNNs) [13]. current leading approaches for exploit shape information extracting CNN from masked image regions. This strategy introduces artificial boundaries on images and may impact quality extracted features. Besides, operations raw domain require compute thousands a single image, which is time-consuming. In this paper, we propose via masking...

10.1109/cvpr.2015.7299025 preprint EN 2015-06-01

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, models based on convolutional neural networks (CNNs) are still an early state. This work presents a new CNN-based foundation model, termed InternImage, which can obtain gain from increasing parameters and training data like ViTs. Different CNNs that focus large dense kernels, InternImage takes deformable convolution as core operator, so our model not only has effective receptive field required for...

10.1109/cvpr52729.2023.01385 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Current semantic segmentation methods focus only on mining "local" context, i.e., dependencies between pixels within individual images, by context-aggregation modules (e.g., dilated convolution, neural attention) or structure-aware optimization criteria IoU-like loss). However, they ignore "global" context of the training data, rich relations across different images. Inspired recent advance in unsupervised contrastive representation learning, we propose a pixel-wise algorithm for fully...

10.1109/iccv48922.2021.00721 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due the fixed structures in its building modules. In this work, we introduce two new modules enhance transformation modeling capacity of CNNs, namely, deformable convolution and RoI pooling. Both based on idea augmenting spatial sampling locations with additional offsets learning from target tasks, without supervision. The can readily replace their plain counterparts existing CNNs be easily trained...

10.48550/arxiv.1703.06211 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Attention mechanisms have become a popular component in deep neural networks, yet there has been little examination of how different influencing factors and methods for computing attention from these affect performance. Toward better general understanding mechanisms, we present an empirical study that ablates various spatial elements within generalized formulation, encompassing the dominant Transformer as well prevalent deformable convolution dynamic modules. Conducted on variety...

10.1109/iccv.2019.00679 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

There has been significant progresses for image object detection in recent years. Nevertheless, video received little attention, although it is more challenging and important practical scenarios. Built upon the works [37, 36], this work proposes a unified approach based on principle of multi-frame end-to-end learning features cross-frame motion. Our extends prior with three new techniques steadily pushes forward performance envelope (speed-accuracy tradeoff), towards high detection.

10.1109/cvpr.2018.00753 preprint EN 2018-06-01
Coming Soon ...