Feng Zhu

ORCID: 0000-0002-4661-0686
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Video Surveillance and Tracking Methods
  • Advanced Neural Network Applications
  • Multimodal Machine Learning Applications
  • Human Pose and Action Recognition
  • Privacy-Preserving Technologies in Data
  • Face recognition and analysis
  • Domain Adaptation and Few-Shot Learning
  • Advanced Image and Video Retrieval Techniques
  • Stochastic Gradient Optimization Techniques
  • Advanced Graph Neural Networks
  • Advanced Vision and Imaging
  • Mechanical and Optical Resonators
  • Recommender Systems and Techniques
  • Robotics and Sensor-Based Localization
  • Cryptography and Data Security
  • Image Retrieval and Classification Techniques
  • Quantum Mechanics and Non-Hermitian Physics
  • Blockchain Technology Applications and Security
  • Gait Recognition and Analysis
  • Generative Adversarial Networks and Image Synthesis
  • Topic Modeling
  • Cryptographic Implementations and Security
  • Advanced Bandit Algorithms Research
  • Multi-Criteria Decision Making
  • Image and Video Quality Assessment

DHC Software (China)
2024-2025

Institute of Genetics and Developmental Biology
2024

Center for Agricultural Resources Research
2024

Fudan University
2022-2024

State Grid Corporation of China (China)
2024

Chinese Academy of Sciences
2024

Harbin Engineering University
2024

National University of Defense Technology
2021-2024

Zhengzhou University
2023-2024

China Telecom (China)
2023

This paper tackles the cross-modality person re-identification (re-ID) problem by suppressing modality discrepancy. In re-ID, query and gallery images are in different modalities. Given a training identity, popular deep classification baseline shares same proxy (i.e., weight vector last layer) for two We find that it has considerable tolerance gap, because shared acts as an intermediate relay between response, we propose Memory-Augmented Unidirectional Metric (MAUM) learning method...

10.1109/cvpr52688.2022.01876 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Open-vocabulary detection (OVD) is an object task aiming at detecting objects from novel categories beyond the base on which detector trained. Recent OVD methods rely large-scale visual-language pre-trained models, such as CLIP, for recognizing objects. We identify two core obstacles that need to be tackled when incorporating these models into training: (1) distribution mismatch happens applying a VL-model trained whole images region recognition tasks; (2) difficulty of localizing unseen...

10.1109/cvpr52729.2023.00679 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Human-centric perceptions (e.g., pose estimation, human parsing, pedestrian detection, person re-identification, etc.) play a key role in industrial applications of visual models. While specific human-centric tasks have their own relevant semantic aspect to focus on, they also share the same underlying structure body. However, few works attempted exploit such homogeneity and design general-propose model for tasks. In this work, we revisit broad range unify them minimalist manner. We propose...

10.1109/cvpr52729.2023.01711 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Recent years have witnessed a rapid growth of deep generative models, with text-to-image models gaining significant attention from the public. However, existing often generate images that do not align well human preferences, such as awkward combinations limbs and facial expressions. To address this issue, we collect dataset choices on generated Stable Foundation Discord channel. Our experiments demonstrate current evaluation metrics for correlate choices. Thus, train preference classifier...

10.1109/iccv51070.2023.00200 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Human-centric perceptions include a variety of vision tasks, which have widespread industrial applications, including surveillance, autonomous driving, and the metaverse. It is desirable to general pretrain model for versatile human-centric downstream tasks. This paper forges ahead along this path from aspects both benchmark pretraining methods. Specifically, we propose HumanBench based on existing datasets comprehensively evaluate common ground generalization abilities different methods 19...

10.1109/cvpr52729.2023.02104 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Referring Expression Segmentation (RES) is a widely explored multi-modal task, which endeavors to segment the pre-existing object within single image with given linguistic expression. However, in broader real-world scenarios, it not always possible determine if described exists specific image. Generally, collection of images available, some potentially contain target objects. To this end, we propose more realistic setting, named Group-wise (GRES), expands RES group related images, allowing...

10.1109/iccv51070.2023.00248 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

With the development and application of Internet Things (IoT), volume data generated daily by IoT devices is growing exponentially. These devices, such as smart wearable produce containing sensitive personal information. However, since users often operate in untrusted external environments, their encrypted remain vulnerable to potential privacy leaks security threats from malicious coercion. Additionally, access control management these critical issues. To address challenges, this paper...

10.3390/e27010032 article EN cc-by Entropy 2025-01-02

10.1109/icassp49660.2025.10889769 article ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Recent text-to-image generative models can generate high-fidelity images from text inputs, but the quality of these generated cannot be accurately evaluated by existing evaluation metrics. To address this issue, we introduce Human Preference Dataset v2 (HPD v2), a large-scale dataset that captures human preferences on wide range sources. HPD comprises 798,090 preference choices 433,760 pairs images, making it largest its kind. The prompts and are deliberately collected to eliminate potential...

10.48550/arxiv.2306.09341 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Backdoors on federated learning will be diluted by subsequent benign updates. This is reflected in the significant reduction of attack success rate as iterations increase, ultimately failing. We use a new metric to quantify degree this weakened backdoor effect, called persistence. Given that research improve performance has not been widely noted, we propose Full Combination Backdoor Attack (FCBA) method. It aggregates more combined trigger information for complete pattern global model....

10.1609/aaai.v38i19.30131 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

We propose ADCLR: A ccurate and D ense Contrastive Representation Learning, a novel self-supervised learning framework for accurate dense vision representation. To extract spatial-sensitive information, ADCLR introduces query patches contrasting in addition with global contrasting. Compared previous methods, mainly enjoys three merits: i) achieving both global-discriminative representation, ii) model-efficient (no extra parameters to the baseline), iii) correspondence-free thus simpler...

10.48550/arxiv.2306.13337 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Blockchains offer strong security guarantees, but they cannot protect the ordering of transactions. Powerful players, such as miners, sequencers, and sophisticated bots, can reap significant profits by selectively including, excluding, or re-ordering user Such are called Miner/Maximal Extractable Value MEV. MEV bears profound implications for blockchain decentralization. While numerous countermeasures have been proposed, there is no agreement on best solution. Moreover, solutions developed...

10.48550/arxiv.2212.05111 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Recent years have witnessed a rapid growth of deep generative models, with text-to-image models gaining significant attention from the public. However, existing often generate images that do not align well human preferences, such as awkward combinations limbs and facial expressions. To address this issue, we collect dataset choices on generated Stable Foundation Discord channel. Our experiments demonstrate current evaluation metrics for correlate choices. Thus, train preference classifier...

10.48550/arxiv.2303.14420 preprint EN cc-by arXiv (Cornell University) 2023-01-01

In recommendation scenarios, there are two long-standing challenges, i.e., selection bias and data sparsity, which lead to a significant drop in prediction accuracy for both Click-Through Rate (CTR) post-click Conversion (CVR) tasks. To cope with these issues, existing works emphasize on leveraging Multi-Task Learning (MTL) frameworks (Category 1) or causal debiasing 2) incorporate more auxiliary the entire exposure/inference space $\mathcal{D}$ debias click/training ${\mathcal{O}}$....

10.1109/icde55515.2023.00239 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2023-04-01

Human-centric perception tasks, e.g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse sports analysis. There is a recent surge to develop human-centric foundation models that can benefit broad range of tasks. While many achieved success, they did not explore 3D vision-language tasks for required task-specific finetuning. These limitations restrict their application more downstream situations. To tackle these...

10.48550/arxiv.2312.01697 preprint EN other-oa arXiv (Cornell University) 2023-01-01

10.1109/tsp.2024.3452035 article EN IEEE Transactions on Signal Processing 2024-01-01

Unsupervised domain adaptation (UDA) aims at adapting the model trained on a labeled source-domain dataset to an unlabeled target-domain dataset. The task of UDA open-set person re-identification (re-ID) is even more challenging as identities (classes) do not have overlap between two domains. One major research direction was based translation, which, however, has fallen out favor in recent years due inferior performance compared pseudo-label-based methods. We argue that translation great...

10.48550/arxiv.2003.06650 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Abstract A new approach to relative orientation based on dual quaternions is proposed. Dual are used express a unified description of the position and two images in stereopair. The coplanarity condition equation its linearised model established. According principle least squares adjustment with constraints, quaternion can be identified relative‐orientation parameters then obtained by non‐linear transformation components quaternion. Experimental results show that proposed feasible more...

10.1111/phor.12111 article EN The Photogrammetric Record 2015-09-01

Scalability is an important consideration for deep graph neural networks. Inspired by the conventional pooling layers in CNNs, many recent learning approaches have introduced strategy to reduce size of graphs learning, such that scalability and efficiency can be improved. However, these pooling-based methods are mainly tailored a single graph-level task pay more attention local information, limiting their performance multi-task settings which often require task-specific global information....

10.48550/arxiv.2204.13429 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific re-identification (ReID) tasks in different scenarios separately, which limits applications real world. This paper strives resolve this problem by proposing a novel instruct-ReID task that requires model images given image or instructions. Instruct-ReID is first exploration of general ReID setting, where existing 6 be viewed as special...

10.48550/arxiv.2405.17790 preprint EN arXiv (Cornell University) 2024-05-27
Coming Soon ...