NFDI4DS | UHH-SEMS - Publication Details

Deep Template-Based Watermarking

OPENALEX - Publications

Han Fang Dongdong Chen Qidong Huang Jie Zhang Zehua Ma and 2 more

Traditional watermarking algorithms have been extensively studied. As an important type of schemes, template-based approaches maintain a very high embedding rate. In such scheme, the message is often represented by some dedicatedly designed templates, and then process carried out additive operation with templates host image. To resist potential distortions, these need to contain special statistical features so that they can be successfully recovered at extracting side. But in existing...

10.1109/tcsvt.2020.3009349 article EN IEEE Transactions on Circuits and Systems for Video Technology 2020-07-15

Poison Ink: Robust and Invisible Backdoor Attack

OPENALEX - Publications

Jie Zhang Dongdong Chen Qidong Huang Jing Liao Weiming Zhang and 3 more

Recent research shows deep neural networks are vulnerable to different types of attacks, such as adversarial attack, data poisoning attack and backdoor attack. Among them, is the most cunning one can occur in almost every stage learning pipeline. Therefore, has attracted lots interests from both academia industry. However, existing methods either visible or fragile some effortless pre-processing common transformations. To address these limitations, we propose a robust invisible called...

10.1109/tip.2022.3201472 article EN IEEE Transactions on Image Processing 2022-01-01

Shape-invariant 3D Adversarial Point Clouds

OPENALEX - Publications

Qidong Huang Xiaoyi Dong Dongdong Chen Hang Zhou Weiming Zhang and 1 more

Adversary and invisibility are two fundamental but conflict characters of adversarial perturbations. Previous attacks on 3D point cloud recognition have often been criticized for their noticeable outliers, since they just involve an "implicit constrain" like global distance loss in the time-consuming optimization to limit generated noise. While is a highly structured data format, it hard constrain its perturbation with simple or metric properly. In this paper, we propose novel Point-Cloud...

10.1109/cvpr52688.2022.01490 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Diversity-Aware Meta Visual Prompting

OPENALEX - Publications

Qidong Huang Xiaoyi Dong Dongdong Chen Weiming Zhang Feifei Wang and 2 more

We present Diversity-Aware Meta Visual Prompting (DAM-VP), an efficient and effective prompting method for transferring pre-trained models to downstream tasks with frozen backbone. A challenging issue in visual is that image datasets sometimes have a large data diversity whereas per-dataset generic prompt can hardly handle the complex distribution shift toward original pretraining properly. To address this issue, we propose dataset strategy whose initialization realized by Meta-prompt....

10.1109/cvpr52729.2023.01047 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

OPENALEX - Publications

Qidong Huang Xiaoyi Dong Pan Zhang Bin Wang Conghui He and 4 more

10.1109/cvpr52733.2024.01274 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

PointCAT: Contrastive Adversarial Training for Robust Point Cloud Recognition

OPENALEX - Publications

Qidong Huang Xiaoyi Dong Dongdong Chen Hang Zhou Weiming Zhang and 4 more

Notwithstanding the prominent performance shown in various applications, point cloud recognition models have often suffered from natural corruptions and adversarial perturbations. In this paper, we delve into boosting general robustness of recognition, proposing Point-Cloud Contrastive Adversarial Training (PointCAT). The main intuition PointCAT is encouraging target model to narrow decision gap between clean clouds corrupted by devising feature-level constraints rather than logit-level...

10.1109/tip.2024.3372456 article EN IEEE Transactions on Image Processing 2024-01-01

Initiative Defense against Facial Manipulation

OPENALEX - Publications

Qidong Huang Jie Zhang Wenbo Zhou Weiming Zhang Nenghai Yu

Benefiting from the development of generative adversarial networks (GAN), facial manipulation has achieved significant progress in both academia and industry recently. It inspires an increasing number entertainment applications but also incurs severe threats to individual privacy even political security meanwhile. To mitigate such risks, many countermeasures have been proposed. However, great majority methods are designed a passive manner, which is detect whether images or videos tampered...

10.1609/aaai.v35i2.16254 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

OPENALEX - Publications

Yujie Zhou Jiazi Bu Pengyang Ling Pan Zhang Tong Wu and 8 more

Recent advancements in image relighting models, driven by large-scale datasets and pre-trained diffusion have enabled the imposition of consistent lighting. However, video still lags, primarily due to excessive training costs scarcity diverse, high-quality datasets. A simple application models on a frame-by-frame basis leads several issues: lighting source inconsistency relighted appearance inconsistency, resulting flickers generated videos. In this work, we propose Light-A-Video,...

10.48550/arxiv.2502.08590 preprint EN arXiv (Cornell University) 2025-02-12

Efficient Fine-tuning Strategies for Enhancing Face Recognition Performance in Challenging Scenarios

OPENALEX - Publications

L. Yin Ziyang Wu Qidong Huang Xinran Liu Baocai Yin and 3 more

10.1109/icassp49660.2025.10887682 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

An Improved Seagull Optimization Algorithm for Multi-Center Maintenance Task Allocation of UAV Swarm Under Resource Constraints

OPENALEX - Publications

Qidong Huang Liang Xia Huan Wang Bin Zhang

10.1109/itoec63606.2025.10967898 article EN 2018 IEEE 4th Information Technology and Mechatronics Engineering Conference (ITOEC) 2025-03-14

SimAC: A Simple Anti-Customization Method for Protecting Face Privacy Against Text-to-Image Synthesis of Diffusion Models

OPENALEX - Publications

Feifei Wang Zhentao Tan Tianyi Wei Yue Wu Qidong Huang

10.1109/cvpr52733.2024.01145 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

OPENALEX - Publications

Haochen Xue Feilong Tang Ming-Che Hu Yexin Liu Qidong Huang and 11 more

Recent multimodal large language models (MLLMs) have demonstrated significant potential in open-ended conversation, generating more accurate and personalized responses. However, their abilities to memorize, recall, reason sustained interactions within real-world scenarios remain underexplored. This paper introduces MMRC, a Multi-Modal Real-world Conversation benchmark for evaluating six core of MLLMs: information extraction, multi-turn reasoning, update, image management, memory answer...

10.48550/arxiv.2502.11903 preprint EN arXiv (Cornell University) 2025-02-17

SMAOA: An improved arithmetic optimization algorithm and its application to engineering optimization

OPENALEX - Publications

Qidong Huang Renhuan Yang Guilian Chen Wei Gao Chao Shen and 4 more

10.1142/s0129183125500780 article EN International Journal of Modern Physics C 2025-03-07

Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting

OPENALEX - Publications

Qidong Huang Xiaoyi Dong Dongdong Chen Yinpeng Chen Lu Yuan and 3 more

In this paper, we investigate the adversarial robustness of vision transformers that are equipped with BERT pretraining (e.g., BEiT, MAE). A surprising observation is MAE has significantly worse than other methods. This drives us to rethink basic differences between these methods and how affect against perturbations. Our empirical analysis reveals highly related reconstruction target, i.e., predicting raw pixels masked image patches will degrade more model semantic context, since it guides...

10.1109/iccv51070.2023.00154 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Ada3Diff: Defending against 3D Adversarial Point Clouds via Adaptive Diffusion

OPENALEX - Publications

Kui Zhang Hang Zhou Jie Zhang Qidong Huang Weiming Zhang and 1 more

Deep 3D point cloud models are sensitive to adversarial attacks, which poses threats safety-critical applications such as autonomous driving. Robust training and defend-by-denoising typical strategies for defending perturbations. However, they either induce massive computational overhead or rely heavily upon specified priors, limiting generalized robustness against attacks of all kinds. To remedy it, this paper introduces a novel distortion-aware defense framework that can rebuild the...

10.1145/3581783.3612018 article EN 2023-10-26

The Control System Assessment Based on a Class of Disturbance Characteristics

OPENALEX - Publications

Yinsong Wang Ying Gao Wanwan Su Weijian Zheng Xiongjie Jiang and 1 more

This work discusses the performance assessment and optimization of a class control systems under load disturbance. Different from stochastic evaluation method which requires relevant information to identify models, paper summarizes applies deterministic according two indicators, namely Idle index (II) Area (AI). Applying evaluate optimize loop in real time by collecting operation data system online, will guide help factory operators maintain normal system. The studied this is applied project...

10.23919/chicc.2018.8483441 article EN 2018-07-01

Hierarchical Terrain Attention and Multi-Scale Rainfall Guidance for Flood Image Prediction

OPENALEX - Publications

Feifei Wang Yong Wang Bing Li Qidong Huang Shaoqing Chen

With the deterioration of climate, phenomenon rain-induced flooding has become frequent. To mitigate its impact, recent works adopt convolutional neural network or variants to predict floods. However, these methods directly force model reconstruct raw pixels flood images through a global constraint, overlooking underlying information contained in terrain features and rainfall patterns. address this, we present novel framework for precise map prediction, which incorporates hierarchical...

10.1109/icip49359.2023.10222894 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2023-09-11

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

OPENALEX - Publications

Qidong Huang Xiaoyi Dong Pan Zhang Bin Wang Conghui He and 4 more

Hallucination, posed as a pervasive challenge of multi-modal large language models (MLLMs), has significantly impeded their real-world usage that demands precise judgment. Existing methods mitigate this issue with either training specific designed data or inferencing external knowledge from other sources, incurring inevitable additional costs. In paper, we present OPERA, novel MLLM decoding method grounded in an Over-trust Penalty and Retrospection-Allocation strategy, serving nearly free...

10.48550/arxiv.2311.17911 preprint EN other-oa arXiv (Cornell University) 2023-01-01

SimAC: A Simple Anti-Customization Method against Text-to-Image Synthesis of Diffusion Models

OPENALEX - Publications

Feifei Wang Zhentao Tan Tianyi Wei Yue Wu Qidong Huang

Despite the success of diffusion-based customization methods on visual content creation, increasing concerns have been raised about such techniques from both privacy and political perspectives. To tackle this issue, several anti-customization proposed in very recent months, predominantly grounded adversarial attacks. Unfortunately, most these adopt straightforward designs, as end-to-end optimization with a focus adversarially maximizing original training loss, thereby neglecting nuanced...

10.48550/arxiv.2312.07865 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Parameter estimation of fractional-order system with improved Archimedes optimization algorithm

OPENALEX - Publications

Yinbin Chen Renhuan Yang Xiuzeng Yang Renyu Yang Qidong Huang and 4 more

In this paper, aiming at the problems of slow estimation speed and low precision traditional fractional-order system (FOS) parameter method, an improved Archimedes optimization algorithm (IAOA) is proposed to calculate optimal value. By establishing model cost function, problem formulated as problem. As opposed (AOA), IAOA introduces three improvements: leadership behavior, levy flight behavior a new adaptive strategy. This paper verifies performance by selecting 10 classic test functions....

10.1142/s0129183124501973 article EN International Journal of Modern Physics C 2024-06-29

Using dynamically changing step sizes to increase the success rate of adversarial attacks

OPENALEX - Publications

Qidong Huang Leiji Lu Jun Chen Lei Bao

10.1109/cvidl62147.2024.10604174 article EN 2024-04-19

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate

OPENALEX - Publications

Qidong Huang Xiaoyi Dong Pan Zhang Yuhang Zang Yuhang Cao and 4 more

We present the Modality Integration Rate (MIR), an effective, robust, and generalized metric to indicate multi-modal pre-training quality of Large Vision Language Models (LVLMs). Large-scale plays a critical role in building capable LVLMs, while evaluating its training without costly supervised fine-tuning stage is under-explored. Loss, perplexity, in-context evaluation results are commonly used metrics for (LLMs), we observed that these less indicative when aligning well-trained LLM with...

10.48550/arxiv.2410.07167 preprint EN arXiv (Cornell University) 2024-10-09

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

OPENALEX - Publications

Long Xing Qidong Huang Xiaoyi Dong Jiajie Lu Pan Zhang and 6 more

In large vision-language models (LVLMs), images serve as inputs that carry a wealth of information. As the idiom "A picture is worth thousand words" implies, representing single image in current LVLMs can require hundreds or even thousands tokens. This results significant computational costs, which grow quadratically input resolution increases, thereby severely impacting efficiency both training and inference. Previous approaches have attempted to reduce number tokens either before within...

10.48550/arxiv.2410.17247 preprint EN arXiv (Cornell University) 2024-10-22

Initiative Defense against Facial Manipulation

OPENALEX - Publications

Qidong Huang Jie Zhang Wenbo Zhou WeimingZhang Nenghai Yu

Benefiting from the development of generative adversarial networks (GAN), facial manipulation has achieved significant progress in both academia and industry recently. It inspires an increasing number entertainment applications but also incurs severe threats to individual privacy even political security meanwhile. To mitigate such risks, many countermeasures have been proposed. However, great majority methods are designed a passive manner, which is detect whether images or videos tampered...

10.48550/arxiv.2112.10098 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Parameter estimation for fractional-order nonlinear systems based on improved sparrow search algorithm

OPENALEX - Publications

Yongqiang Zhou Renhuan Yang Yibin Chen Qidong Huang Chao Shen and 3 more

Parameter estimation is important in the study of control and synchronization fractional-order nonlinear systems (FONSs). This paper proposes an improved Sparrow Search Algorithm (ISSA) for parameter problem FONSs. The algorithm improves population initialization, position update method discoverers warning sparrows based on (SSA), simulation experiment financial system L conducted to demonstrate this method. experimental results show that proposed ISSA superior SSA, Particle Swarm...

10.1142/s0129183124501316 article EN International Journal of Modern Physics C 2024-03-02