NFDI4DS | UHH-SEMS - Publication Details

Shilong Liu

ORCID: 0009-0003-5796-0627

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5115076783

Research Areas

Advanced Neural Network Applications
Advanced Image and Video Retrieval Techniques
Multimodal Machine Learning Applications
Domain Adaptation and Few-Shot Learning
Advanced Vision and Imaging
Natural Language Processing Techniques
Handwritten Text Recognition Techniques
Topic Modeling
Anomaly Detection Techniques and Applications
Visual Attention and Saliency Detection
Video Analysis and Summarization
Speech and dialogue systems
Rock Mechanics and Modeling
Adversarial Robustness in Machine Learning
Machine Learning and Data Classification
Generative Adversarial Networks and Image Synthesis
Human Pose and Action Recognition
Video Surveillance and Tracking Methods
Geomechanics and Mining Engineering
Face recognition and analysis
Multi-Agent Systems and Negotiation
Industrial Vision Systems and Defect Detection
Geoscience and Mining Technology
Network Packet Processing and Optimization
Robotics and Sensor-Based Localization

Soonchunhyang University
2024

Northwest Normal University
2024

Southwest University of Science and Technology
2024

Tsinghua University
2021-2023

Shanghai University
2023

Robert Bosch (United States)
2023

Beijing University of Posts and Telecommunications
2019

Tianjin University of Science and Technology
2017

Discovery Institute
2014

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

OPENALEX - Publications

Feng Li Hao Zhang Shilong Liu Jian Guo Lionel M. Ni and 1 more

We present in this paper a novel denoising training method to speedup DETR (DEtection TRansformer) and offer deepened understanding of the slow convergence issue DETR-like methods. show that results from instability bipartite graph matching which causes inconsistent optimization goals early stages. To address issue, except for Hungarian loss, our additionally feeds ground-truth bounding boxes with noises into Transformer decoder trains model reconstruct original boxes, effectively reduces...

10.1109/cvpr52688.2022.01325 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

OPENALEX - Publications

Hao Zhang Feng Li Shilong Liu Lei Zhang Hang Su and 3 more

We present DINO (\textbf{D}ETR with \textbf{I}mproved de\textbf{N}oising anch\textbf{O}r boxes), a state-of-the-art end-to-end object detector. % in this paper. improves over previous DETR-like models performance and efficiency by using contrastive way for denoising training, mixed query selection method anchor initialization, look forward twice scheme box prediction. achieves $49.4$AP $12$ epochs $51.3$AP $24$ on COCO ResNet-50 backbone multi-scale features, yielding significant improvement...

10.48550/arxiv.2203.03605 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

OPENALEX - Publications

Feng Li Hao Zhang Huaizhe Xu Shilong Liu Lei Zhang and 2 more

In this paper we present Mask DINO, a unified object detection and segmentation framework. DINO extends (DETR with Improved Denoising Anchor Boxes) by adding mask prediction branch which supports all image tasks (instance, panoptic, semantic). It makes use of the query embeddings from to dot-product high-resolution pixel embedding map predict set binary masks. Some key components in are extended for through shared architecture training process. is simple, efficient, scalable, it can benefit...

10.1109/cvpr52729.2023.00297 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

OPENALEX - Publications

Shilong Liu Feng Li Hao Zhang Xiao Yang Xianbiao Qi and 3 more

We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer deeper understanding of the role queries DETR. This new directly uses box coordinates as Transformer decoders dynamically updates them layer-by-layer. Using not only helps explicit positional priors to improve query-to-feature similarity eliminate slow training convergence issue DETR, but also allows us modulate attention map width height information. Such design makes it...

10.48550/arxiv.2201.12329 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

OPENALEX - Publications

Shilong Liu Zhaoyang Zeng Tianhe Ren Feng Li Hao Zhang and 6 more

In this paper, we present an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects human inputs such as category names or referring expressions. The key solution of detection is introducing language to a closed-set for concept generalization. To effectively fuse and vision modalities, conceptually divide into three phases propose tight fusion solution, includes feature enhancer,...

10.48550/arxiv.2303.05499 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR

OPENALEX - Publications

Feng Li Ailing Zeng Shilong Liu Hao Zhang Hongyang Li and 2 more

Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance. Its success cannot be achieved without the re-introduction of multi-scale feature fusion in encoder. However, excessively increased tokens features, especially for about 75% low-level are quite computationally inefficient, which hinders real applications DETR models. In this paper, we present Lite DETR, a simple yet efficient end-to-end object detection framework that can effectively reduce GFLOPs head by...

10.1109/cvpr52729.2023.01780 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

A Simple Framework for Open-Vocabulary Segmentation and Detection

OPENALEX - Publications

Hao Zhang Feng Li Xueyan Zou Shilong Liu Chunyuan Li and 2 more

We present OpenSeeD, a simple Open-vocabulary Segmentation and Detection framework that jointly learns from different segmentation detection datasets. To bridge the gap of vocabulary annotation granularity, we first introduce pre-trained text encoder to encode all visual concepts in two tasks learn common semantic space for them. This gives us reasonably good results compared with counterparts trained on task only. further reconcile them, identify discrepancies: i) discrepancy – requires...

10.1109/iccv51070.2023.00100 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

MP-Former: Mask-Piloted Transformer for Image Segmentation

OPENALEX - Publications

Hao Zhang Feng Li Huaizhe Xu Shijia Huang Shilong Liu and 2 more

We present a mask-piloted Transformer which improves masked-attention in Mask2Former for image segmentation. The improvement is based on our observation that suffers from inconsistent mask predictions between consecutive decoder layers, leads to optimization goals and low utilization of queries. To address this problem, we propose training approach, additionally feeds noised ground-truth masks trains the model reconstruct original ones. Compared with predicted used mask-attention, serve as...

10.1109/cvpr52729.2023.01733 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

OPENALEX - Publications

Tianhe Ren Shilong Liu Ailing Zeng Jing Lin Kunchang Li and 12 more

We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM). This integration enables detection and segmentation of any regions based on arbitrary text inputs opens a door connecting various vision models. As shown in Fig.1, wide range tasks can be achieved by using versatile SAM pipeline. For example, automatic annotation pipeline solely input images realized incorporating models such BLIP Recognize Anything....

10.48550/arxiv.2401.14159 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Query2Label: A Simple Transformer Way to Multi-Label Classification

OPENALEX - Publications

Shilong Liu Lei Zhang Xiao Yang Hang Su Jun Zhu

This paper presents a simple and effective approach to solving the multi-label classification problem. The proposed leverages Transformer decoders query existence of class label. use is rooted in need extracting local discriminative features adaptively for different labels, which strongly desired property due multiple objects one image. built-in cross-attention module decoder offers an way label embeddings as queries probe pool class-related from feature map computed by vision backbone...

10.48550/arxiv.2107.10834 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Semantic-SAM: Segment and Recognize Anything at Any Granularity

OPENALEX - Publications

Feng Li Hao Zhang Peize Sun Xueyan Zou Shilong Liu and 4 more

In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. Our offers two key advantages: semantic-awareness granularity-abundance. To achieve semantic-awareness, consolidate multiple datasets across three granularities decoupled classification for objects parts. This allows our capture rich semantic information. For the multi-granularity capability, propose multi-choice learning scheme during training,...

10.48550/arxiv.2307.04767 preprint EN other-oa arXiv (Cornell University) 2023-01-01

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

OPENALEX - Publications

Feng Li Hao Zhang Shilong Liu Jian Guo Lionel M. Ni and 1 more

We present in this paper a novel denoising training method to speed up DETR (DEtection TRansformer) and offer deepened understanding of the slow convergence issue DETR-like methods. show that results from instability bipartite graph matching which causes inconsistent optimization goals early stages. To address issue, except for Hungarian loss, our additionally feeds GT bounding boxes with noises into Transformer decoder trains model reconstruct original boxes, effectively reduces difficulty...

10.1109/tpami.2023.3335410 article EN cc-by-nc-nd IEEE Transactions on Pattern Analysis and Machine Intelligence 2023-12-01

Recognize Anything: A Strong Image Tagging Model

OPENALEX - Publications

Youcai Zhang Xinyu Huang Jinyu Ma Zhaoyang Li Zhaochuan Luo and 7 more

We present the Recognize Anything Model (RAM): a strong foundation model for image tagging. RAM makes substantial step large models in computer vision, demonstrating zero-shot ability to recognize any common category with high accuracy. introduces new paradigm tagging, leveraging large-scale image-text pairs training instead of manual annotations. The development comprises four key steps. Firstly, annotation-free tags are obtained at scale through automatic text semantic parsing....

10.48550/arxiv.2306.03514 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

OPENALEX - Publications

Feng Li Hao Zhang Yifan Zhang Shilong Liu Jian Guo and 3 more

This paper presents a comprehensive survey of vision-language (VL) intelligence from the perspective time. is inspired by remarkable progress in both computer vision and natural language processing, recent trends shifting single modality processing to multiple comprehension. We summarize development this field into three time periods, namely task-specific methods, pre-training (VLP) larger models empowered large-scale weakly-labeled data. first take some common VL tasks as examples introduce...

10.48550/arxiv.2203.01922 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Detection Transformer with Stable Matching

OPENALEX - Publications

Shilong Liu Tianhe Ren Jiayu Chen Zhaoyang Zeng Hao Zhang and 6 more

This paper is concerned with the matching stability problem across different decoder layers in DEtection TRansformers (DETR). We point out that unstable DETR caused by a multi-optimization path problem, which highlighted one-to-one design DETR. To address this we show most important to use and only positional metrics (like IOU) supervise classification scores of positive examples. Under principle, propose two simple yet effective modifications integrating DETR's loss cost, named...

10.1109/iccv51070.2023.00597 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Online cross session electromyographic hand gesture recognition using deep learning and transfer learning

OPENALEX - Publications

Zhen Zhang Shilong Liu Yanyu Wang Wei Song Yuhui Zhang

10.1016/j.engappai.2023.107251 article EN Engineering Applications of Artificial Intelligence 2023-10-09

DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting

OPENALEX - Publications

Hongyang Li Hao Zhang Zhaoyang Zeng Shilong Liu Feng Li and 2 more

In this paper, we propose a new operator, called 3D DeFormable Attention (DFA3D), for 2D-to-3D feature lifting, which transforms multi-view 2D image features into unified space object detection. Existing lifting approaches, such as Lift-Splat-based and attention-based, either use estimated depth to get pseudo LiDAR then splat them space, is one-pass operation without refinement, or ignore lift by attention mechanisms, achieve finer semantics while suffering from ambiguity problem. contrast,...

10.1109/iccv51070.2023.00615 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

MMedAgent: Learning to Use Medical Tools with Multi-modal Agent

OPENALEX - Publications

Binxu Li Tiankai Yan Yuanting Pan Jie Luo Ruiyang Ji and 6 more

10.18653/v1/2024.findings-emnlp.510 article EN 2024-01-01

Unsupervised Part Segmentation through Disentangling Appearance and Shape

OPENALEX - Publications

Shilong Liu Lei Zhang Xiao Yang Hang Su Jun Zhu

We study the problem, of unsupervised discovery and segmentation object parts, which, as an intermediate local representation, are capable finding intrinsic structure providing more explainable recognition results. Recent methods have greatly relaxed dependency on annotated data which costly to obtain, but still rely additional information such mask or saliency map. To remove a further improve part performance, we develop novel approach by disentangling appearance shape representations parts...

10.1109/cvpr46437.2021.00825 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

detrex: Benchmarking Detection Transformers

OPENALEX - Publications

Tianhe Ren Shilong Liu Feng Li Hao Zhang Ailing Zeng and 11 more

The DEtection TRansformer (DETR) algorithm has received considerable attention in the research community and is gradually emerging as a mainstream approach for object detection other perception tasks. However, current field lacks unified comprehensive benchmark specifically tailored DETR-based models. To address this issue, we develop unified, highly modular, lightweight codebase called detrex, which supports majority of instance recognition algorithms, covering various fundamental tasks,...

10.48550/arxiv.2306.07265 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation

OPENALEX - Publications

Jie Yang Ailing Zeng Shilong Liu Feng Li Ruimao Zhang and 1 more

This paper presents a novel end-to-end framework with Explicit box Detection for multi-person Pose estimation, called ED-Pose, where it unifies the contextual learning between human-level (global) and keypoint-level (local) information. Different from previous one-stage methods, ED-Pose re-considers this task as two explicit detection processes unified representation regression supervision. First, we introduce human decoder encoded tokens to extract global features. It can provide good...

10.48550/arxiv.2302.01593 preprint EN other-oa arXiv (Cornell University) 2023-01-01

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding

OPENALEX - Publications

Shilong Liu Shijia Huang Feng Li Hao Zhang Yaoyuan Liang and 3 more

In this paper, we study the problem of visual grounding by considering both phrase extraction and (PEG). contrast to previous phrase-known-at-test setting, PEG requires a model extract phrases from text locate objects image simultaneously, which is more practical setting in real applications. As can be regarded as 1D segmentation problem, formulate dual detection propose novel DQ-DETR model, introduces queries probe different features for object prediction mask prediction. Each pair are...

10.1609/aaai.v37i2.25261 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

OPENALEX - Publications

Tianhe Ren Qing Jiang Shilong Liu Zhaoyang Zeng Wenlong Liu and 11 more

This paper introduces Grounding DINO 1.5, a suite of advanced open-set object detection models developed by IDEA Research, which aims to advance the "Edge" detection. The encompasses two models: 1.5 Pro, high-performance model designed for stronger generalization capability across wide range scenarios, and Edge, an efficient optimized faster speed demanded in many applications requiring edge deployment. Pro advances its predecessor scaling up architecture, integrating enhanced vision...

10.48550/arxiv.2405.10300 preprint EN arXiv (Cornell University) 2024-05-16

Coming Soon ...