NFDI4DS | UHH-SEMS - Publication Details

Transformer-Based Visual Segmentation: A Survey

OPENALEX - Publications

Xiangtai Li Henghui Ding Haobo Yuan Wenwei Zhang Jiangmiao Pang and 4 more

Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, considerably surpassed previous...

10.1109/tpami.2024.3434373 article EN cc-by-nc-nd IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-07-29

Towards Open Vocabulary Learning: A Survey

OPENALEX - Publications

Jianzong Wu Xiangtai Li Shilin Xu Haobo Yuan Henghui Ding and 7 more

In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on close-set assumption, meaning that model can only identify pre-defined categories are present training set. Recently, open vocabulary settings were proposed due to rapid progress vision language pre-training. These new seek locate recognize beyond annotated label space. The approach is more...

10.1109/tpami.2024.3361862 article EN cc-by-nc-nd IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-02-05

Multi-Task Learning With Multi-Query Transformer for Dense Prediction

OPENALEX - Publications

Yangyang Xu Xiangtai Li Haobo Yuan Yibo Yang Lefei Zhang

Previous multi-task dense prediction studies developed complex pipelines such as multi-modal distillations in multiple stages or searching for task relational contexts each task. The core insight beyond these methods is to maximize the mutual effects of Inspired by recent query-based Transformers, we propose a simple pipeline named Multi-Query Transformer (MQTransformer) that equipped with queries from different tasks facilitate reasoning among and simplify cross-task interaction pipeline....

10.1109/tcsvt.2023.3292995 article EN IEEE Transactions on Circuits and Systems for Video Technology 2023-07-07

Panoptic-PartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation

OPENALEX - Publications

Xiangtai Li Shilin Xu Yibo Yang Haobo Yuan Guangliang Cheng and 4 more

Panoptic Part Segmentation (PPS) unifies panoptic and part segmentation into one task. Previous works utilize separate approaches to handle things, stuff, predictions without shared computation task association. We aim unify these tasks at the architectural level, designing first end-to-end unified framework, Panoptic-PartFormer. Moreover, we find previous metric PartPQ biases PQ. To both issues, design a meta-architecture that decouples features things/stuff features, respectively. model...

10.1109/tpami.2024.3453916 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-01-01

Spatio-Temporal Evolution of Net Ecosystem Productivity and Its Influencing Factors in Northwest China, 1982–2022

OPENALEX - Publications

Weijie Zhang Zhichao Xu Haobo Yuan Yingying Wang Kai Feng and 3 more

The carbon cycle in terrestrial ecosystems is a crucial component of the global cycle, and drought increasingly recognized as significant stressor impacting their sink function. Net ecosystem productivity (NEP), which key indicator capacity, closely related to vegetation Primary Productivity (NPP), derived using Carnegie-Ames-Stanford Approach (CASA) model. However, there limited research on desert grassland ecosystems, offer unique insights due long-term data series. relationship between...

10.3390/agriculture15060613 article EN cc-by Agriculture 2025-03-13

Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation

OPENALEX - Publications

Xiangtai Li Haobo Yuan Wenwei Zhang Guangliang Cheng Jiangmiao Pang and 1 more

Video segmentation aims to segment and track every pixel in diverse scenarios accurately. In this paper, we present Tube-Link, a versatile framework that addresses multiple core tasks of video with unified architecture. Our is near-online approach takes short subclip as input outputs the corresponding spatial-temporal tube masks. To enhance modeling cross-tube relationships, propose an effective way perform tube-level linking via attention along queries. addition, introduce temporal...

10.1109/iccv51070.2023.01280 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class Incremental Learning

OPENALEX - Publications

Yibo Yang Haobo Yuan Xiangtai Li Zhouchen Lin Philip Torr and 1 more

Few-shot class-incremental learning (FSCIL) has been a challenging problem as only few training samples are accessible for each novel class in the new sessions. Finetuning backbone or adjusting classifier prototypes trained prior sessions would inevitably cause misalignment between feature and of old classes, which explains well-known catastrophic forgetting problem. In this paper, we deal with dilemma FSCIL inspired by recently discovered phenomenon named neural collapse, reveals that...

10.48550/arxiv.2302.03004 preprint EN other-oa arXiv (Cornell University) 2023-01-01

RAP-SAM: Towards Real-Time All-Purpose Segment Anything

OPENALEX - Publications

Shilin Xu Haobo Yuan Qingyu Shi Qi Lu Jingbo Wang and 7 more

Advanced by transformer architecture, vision foundation models (VFMs) achieve remarkable progress in performance and generalization ability. Segment Anything Model (SAM) is one model that can generalized segmentation. However, most VFMs cannot run realtime, which makes it difficult to transfer them into several products. On the other hand, current real-time segmentation mainly has purpose, such as semantic on driving scene. We argue diverse outputs are needed for real applications. Thus,...

10.48550/arxiv.2401.10228 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Point Could Mamba: Point Cloud Learning via State Space Model

OPENALEX - Publications

Tao Zhang Xiangtai Li Haobo Yuan Shunping Ji Shuicheng Yan

In this work, for the first time, we demonstrate that Mamba-based point cloud methods can outperform point-based methods. Mamba exhibits strong global modeling capabilities and linear computational complexity, making it highly attractive analysis. To enable more effective processing of 3-D data by Mamba, propose a novel Consistent Traverse Serialization to convert clouds into 1-D sequences while ensuring neighboring points in sequence are also spatially adjacent. yields six variants...

10.48550/arxiv.2403.00762 preprint EN arXiv (Cornell University) 2024-03-01

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

OPENALEX - Publications

Haobo Yuan Xiangtai Li Tao Zhang Zilong Huang Shilin Xu and 5 more

This work presents Sa2VA, the first unified model for dense grounded understanding of both images and videos. Unlike existing multi-modal large language models, which are often limited to specific modalities tasks, Sa2VA supports a wide range image video including referring segmentation conversation, with minimal one-shot instruction tuning. combines SAM-2, foundation model, LLaVA, an advanced vision-language unifies text, image, into shared LLM token space. Using LLM, generates tokens that...

10.48550/arxiv.2501.04001 preprint EN arXiv (Cornell University) 2025-01-07

BOSSA: A Decentralized System for Proofs of Data Retrievability and Replication

OPENALEX - Publications

Dian Chen Haobo Yuan Shengshan Hu Qian Wang Cong Wang

Proofs of retrievability and proofs replication are two cryptographic tools that enable a remote server to prove the users' data has been correctly stored. Nevertheless, literature either requires users themselves perform expensive verification jobs, or relies on “fully trustworthy” third party auditor (TPA) execute public verification. In addition, none existing solutions consider underlying incentive issues behind rational who is motivated collect but tries evade checking in order save...

10.1109/tpds.2020.3030063 article EN IEEE Transactions on Parallel and Distributed Systems 2020-10-12

Transformer-Based Visual Segmentation: A Survey

OPENALEX - Publications

Xiangtai Li Henghui Ding Wenwei Zhang Haobo Yuan Jiangmiao Pang and 4 more

Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, considerably surpassed previous...

10.48550/arxiv.2304.09854 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively

OPENALEX - Publications

Haobo Yuan Xiangtai Li Chong Zhou Yining Li Kai Chen and 1 more

The CLIP and Segment Anything Model (SAM) are remarkable vision foundation models (VFMs). SAM excels in segmentation tasks across diverse domains, while is renowned for its zero-shot recognition capabilities. This paper presents an in-depth exploration of integrating these two into a unified framework. Specifically, we introduce the Open-Vocabulary SAM, SAM-inspired model designed simultaneous interactive recognition, leveraging unique knowledge transfer modules: SAM2CLIP CLIP2SAM. former...

10.48550/arxiv.2401.02955 preprint EN other-oa arXiv (Cornell University) 2024-01-01

OMG-Seg: Is One Model Good Enough For All Segmentation?

OPENALEX - Publications

Xiangtai Li Haobo Yuan Wei Li Henghui Ding Size Wu and 4 more

In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models. We propose OMG-Seg, One Model that is Good enough to efficiently and effectively handle all the including image semantic, instance, panoptic segmentation, as well their video counterparts, open vocabulary settings, prompt-driven, interactive like SAM, object segmentation. To our knowledge, first model these tasks in one achieve satisfactory performance. show a...

10.48550/arxiv.2401.10229 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

OPENALEX - Publications

Haobo Yuan Xiangtai Li Qi Lu Tao Zhang Ming–Hsuan Yang and 2 more

Transformer-based segmentation methods face the challenge of efficient inference when dealing with high-resolution images. Recently, several linear attention architectures, such as Mamba and RWKV, have attracted much they can process long sequences efficiently. In this work, we focus on designing an segment-anything model by exploring these different architectures. Specifically, design a mixed backbone that contains convolution RWKV operation, which achieves best for both accuracy...

10.48550/arxiv.2406.19369 preprint EN arXiv (Cornell University) 2024-06-27

Fast Algorithm to Extract the Singularity of Higher Order Moment Method

OPENALEX - Publications

Haobo Yuan N. Wang Liang Chen

The higher order moment method has far fewer unknowns compared to the low methods. However, computation of self-term matrix elements is extremely time-consuming. This paper presents an algorithm extract singularity by dividing integrand into two parts on account Taylor's formula. first part with a removable discontinuity easy be integrated. rest consists three principal singular functions. Their singularities are canceled Jacobian simple transformation. extraction leads rapid non-redundant...

10.1163/156939308784158904 article EN Journal of Electromagnetic Waves and Applications 2008-01-01

Machine learning analysis and inference of student performance and visualization of data results based on a small dataset of student information

OPENALEX - Publications

Haoyang Li Wenxuan Li Zihao Zhang Haobo Yuan Yunxiang Wan

Education is something that every country values, and education data a very important resource for the country. With increase in proportion of country, size student body getting bigger bigger. Student performance directly related to core entire education. By analyzing student's information predicting future performance, this prediction does not only mean improvement grades, but also can summarize methods effectively help avoid above situation. In study, some various algorithms machine...

10.1109/mlbdbi54094.2021.00031 article EN 2021 3rd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI) 2021-12-01

Multi-Task Learning with Multi-Query Transformer for Dense Prediction

OPENALEX - Publications

Yangyang Xu Xiangtai Li Haobo Yuan Yibo Yang Jing Zhang and 3 more

Previous multi-task dense prediction studies developed complex pipelines such as multi-modal distillations in multiple stages or searching for task relational contexts each task. The core insight beyond these methods is to maximize the mutual effects of Inspired by recent query-based Transformers, we propose a simple pipeline named Multi-Query Transformer (MQTransformer) that equipped with queries from different tasks facilitate reasoning among and simplify cross-task interaction pipeline....

10.48550/arxiv.2205.14354 preprint EN other-oa arXiv (Cornell University) 2022-01-01

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

OPENALEX - Publications

Tao Zhang Xiangtai Li Fei Hao Haobo Yuan Shengqiong Wu and 3 more

Current universal segmentation methods demonstrate strong capabilities in pixel-level image and video understanding. However, they lack reasoning abilities cannot be controlled via text instructions. In contrast, large vision-language multimodal models exhibit powerful vision-based conversation but understanding have difficulty accepting visual prompts for flexible user interaction. This paper proposes OMG-LLaVA, a new elegant framework combining vision with abilities. It can accept various...

10.48550/arxiv.2406.19389 preprint EN arXiv (Cornell University) 2024-06-27

Propagation Dynamics from Meteorological to Agricultural Drought in Northwestern China: Key Influencing Factors

OPENALEX - Publications

Kai Feng Haobo Yuan Yingying Wang Yanbin Li Xiaowan Wang and 3 more

Meteorological and agricultural droughts are inherently correlated, whereas the propagation mechanism between them remains unclear in Northwestern China. Investigating linkages these drought types identifying potential influencing factors is crucial for effective water resource management mitigation. This study adopted Standardized Precipitation Evapotranspiration Index (SPEI) Soil Moisture (SSMI) to characterize meteorological from 1960 2018. The time was detected using Pearson correlation...

10.3390/agronomy14091987 article EN cc-by Agronomy 2024-09-02

Monocular Road Planar Parallax Estimation

OPENALEX - Publications

Haobo Yuan Teng Chen Wei Sui Jiafeng Xie Lefei Zhang and 2 more

Estimating the 3D structure of drivable surface and surrounding environment is a crucial task for assisted autonomous driving. It commonly solved either by using sensors such as LiDAR or directly predicting depth points via deep learning. However, former expensive, latter lacks use geometry information scene. In this paper, instead following existing methodologies, we propose Road Planar Parallax Attention Network (RPANet), new neural network sensing from monocular image sequences based on...

10.1109/tip.2023.3289323 article EN IEEE Transactions on Image Processing 2023-01-01

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation

OPENALEX - Publications

Xiangtai Li Shilin Xu Yibo Yang Haobo Yuan Guangliang Cheng and 3 more

Panoptic Part Segmentation (PPS) unifies panoptic and part segmentation into one task. Previous works utilize separate approaches to handle things, stuff, predictions without shared computation task association. We aim unify these tasks at the architectural level, designing first end-to-end unified framework, Panoptic-PartFormer. Moreover, we find previous metric PartPQ biases PQ. To both issues, design a meta-architecture that decouples features things/stuff features, respectively. model...

10.48550/arxiv.2301.00954 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation

OPENALEX - Publications

Xiangtai Li Haobo Yuan Wenwei Zhang Guangliang Cheng Jiangmiao Pang and 1 more

Video segmentation aims to segment and track every pixel in diverse scenarios accurately. In this paper, we present Tube-Link, a versatile framework that addresses multiple core tasks of video with unified architecture. Our is near-online approach takes short subclip as input outputs the corresponding spatial-temporal tube masks. To enhance modeling cross-tube relationships, propose an effective way perform tube-level linking via attention along queries. addition, introduce temporal...

10.48550/arxiv.2303.12782 preprint EN other-oa arXiv (Cornell University) 2023-01-01