NFDI4DS | UHH-SEMS - Publication Details

Towards Unified INT8 Training for Convolutional Neural Network

OPENALEX - Publications

Feng Zhu Ruihao Gong Fengwei Yu Xianglong Liu Yanfei Wang and 3 more

Recently low-bit (e.g., 8-bit) network quantization has been extensively studied to accelerate the inference. Besides inference, training with quantized gradients can further bring more considerable acceleration, since backward process is often computation-intensive. Unfortunately, inappropriate of propagation usually makes unstable and even crash. There lacks a successful unified framework that support diverse networks on various tasks. In this paper, we give an attempt build 8-bit (INT8)...

10.1109/cvpr42600.2020.00204 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Revisiting the Transferability of Supervised Pretraining: an MLP Perspective

OPENALEX - Publications

Yizhou Wang Shixiang Tang Feng Zhu Lei Bai Rui Zhao and 2 more

The pretrain-finetune paradigm is a classical pipeline in visual learning. Recent progress on unsupervised pretraining methods shows superior transfer performance to their supervised counterparts. This paper revisits this phenomenon and sheds new light understanding the transferability gap between from multilayer perceptron (MLP) perspective. While previous works [6], [8], [17] focus effectiveness of MLP image classification where evaluation are conducted same dataset, we reveal that...

10.1109/cvpr52688.2022.00897 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

RelationLMM: Large Multimodal Model as Open and Versatile Visual Relationship Generalist

OPENALEX - Publications

Chi Xie Shuang Liang Jianxin Li Zhao Zhang Feng Zhu and 2 more

Visual relationships are crucial for visual perception and reasoning, cover tasks like Scene Graph Generation, Human-Object Interaction, object affordance. Despite significant efforts, this field still suffers from the following limitations: specialists a specific task without considering similar ones, strict complex formulations with limited flexibility, underexploited reasoning language knowledge. To solve these limitations, we seek to build new framework, one model all tasks, over Large...

10.1109/tpami.2025.3531452 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2025-01-01

Practical Continual Forgetting for Pre-trained Vision Models

OPENALEX - Publications

Hongbo Zhao Fei Zhu Bolin Ni Feng Zhu Gaofeng Meng and 1 more

For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time both users model owners, these usually form a sequence. Therefore, under such setting, selective expected be continuously removed while maintaining rest. We define this problem as continual forgetting identify three key challenges. (i) knowledge, efficient effective deleting crucial. (ii)...

10.48550/arxiv.2501.09705 preprint EN arXiv (Cornell University) 2025-01-16

Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification

OPENALEX - Publications

Weizhen He Yiheng Deng Yunfeng Yan Feng Zhu Yizhou Wang and 6 more

Recently, person re-identification (ReID) has witnessed fast development due to its broad practical applications and proposed various settings, e.g., traditional ReID, clothes-changing visible-infrared ReID. However, current studies primarily focus on single specific tasks, which limits model applicability in real-world scenarios. This paper aims address this issue by introducing a novel instruct-ReID task that unifies 6 existing ReID tasks one retrieves images based provided visual or...

10.1109/tpami.2025.3538766 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2025-01-01

DASOT: A Unified Framework Integrating Data Association and Single Object Tracking for Online Multi-Object Tracking

OPENALEX - Publications

Qi Chu Wanli Ouyang Bin Liu Feng Zhu Nenghai Yu

In this paper, we propose an online multi-object tracking (MOT) approach that integrates data association and single object (SOT) with a unified convolutional network (ConvNet), named DASOTNet. The intuition behind integrating SOT is they can complement each other. Following Siamese architecture, DASOTNet consists of the shared feature ConvNet, branch branch. Data treated as special re-identification task solved by learning discriminative features for different targets in To handle problem...

10.1609/aaai.v34i07.6694 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

COCAS+: Large-Scale Clothes-Changing Person Re-Identification With Clothes Templates

OPENALEX - Publications

Shihua Li Haobin Chen Shijie Yu Zhiqun He Feng Zhu and 3 more

Recent years person re-identification (ReID) has been developed rapidly due to its broad practical applications. Most existing benchmarks assume that the same wears clothes across captured images, while, in real-world scenarios, may change his/her frequently. Thus Clothes-Changing ReID (CC-ReID) problem is introduced and several related are established. CC-ReID a very difficult task as main visual characteristics of human body, clothes, different between query gallery, clothes-irrelevant...

10.1109/tcsvt.2022.3216769 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-11-03

Improving Facial Attribute Recognition by Group and Graph Learning

OPENALEX - Publications

Zhenghao Chen Shuhang Gu Feng Zhu Jing Xu Rui Zhao

Exploiting the relationships between attributes is a key challenge for improving multiple facial attribute recognition. In this work, we are concerned with two types of correlations that spatial and non-spatial relationships. For correlation, aggregate similarity into part-based group then introduce Group Attention Learning to generate attention feature. On other hand, discover relationship, model group-based Graph Correlation explore affinities predefined groups. We utilize such affinity...

10.1109/icme51207.2021.9428078 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2021-06-09

Towards Unified INT8 Training for Convolutional Neural Network

OPENALEX - Publications

Feng Zhu Ruihao Gong Fengwei Yu Xianglong Liu Yanfei Wang and 3 more

Recently low-bit (e.g., 8-bit) network quantization has been extensively studied to accelerate the inference. Besides inference, training with quantized gradients can further bring more considerable acceleration, since backward process is often computation-intensive. Unfortunately, inappropriate of propagation usually makes unstable and even crash. There lacks a successful unified framework that support diverse networks on various tasks. In this paper, we give an attempt build 8-bit (INT8)...

10.48550/arxiv.1912.12607 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Trust Your Partner’s Friends: Hierarchical Cross-Modal Contrastive Pre-Training for Video-Text Retrieval

OPENALEX - Publications

Yuhan Xiang Kaijian Liu Shixiang Tang Lei Bai Feng Zhu and 2 more

Video-text retrieval has greatly benefited from the massive web video in recent years, while performance is still limited to weak supervision uncurated data. In this work, we propose leverage well-represented information of each original modality and exploit complementary two views same video, i.e., clips captions, by using one view obtain positive samples with neighboring other. Respecting hierarchical organization real-world data, further design a cross-modal pre-training method (HCP)...

10.1109/icassp49357.2023.10097061 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Relation-Aware Distribution Representation Network for Person Clustering With Multiple Modalities

OPENALEX - Publications

Kaijian Liu Shixiang Tang Ziyue Li Zhishuai Li Lei Bai and 2 more

Person clustering with multi-modal clues, including faces, bodies, and voices, is critical for various tasks, such as movie parsing identity-based editing. Related methods multi-view mainly project features into a joint feature space. However, clue are usually rather weakly correlated due to the semantic gap from modality-specific uniqueness. As result, these not suitable person clustering. In this paper, we propose <bold xmlns:mml="http://www.w3.org/1998/Math/MathML"...

10.1109/tmm.2023.3304454 article EN IEEE Transactions on Multimedia 2023-08-22

Revisiting the Transferability of Supervised Pretraining: an MLP Perspective

OPENALEX - Publications

Yizhou Wang Shixiang Tang Feng Zhu Lei Bai Rui Zhao and 2 more

The pretrain-finetune paradigm is a classical pipeline in visual learning. Recent progress on unsupervised pretraining methods shows superior transfer performance to their supervised counterparts. This paper revisits this phenomenon and sheds new light understanding the transferability gap between from multilayer perceptron (MLP) perspective. While previous works focus effectiveness of MLP image classification where evaluation are conducted same dataset, we reveal that projector also key...

10.48550/arxiv.2112.00496 preprint EN other-oa arXiv (Cornell University) 2021-01-01