Qidong Huang

ORCID: 0000-0003-2702-8516
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Adversarial Robustness in Machine Learning
  • Generative Adversarial Networks and Image Synthesis
  • Digital Media Forensic Detection
  • Multimodal Machine Learning Applications
  • High-Velocity Impact and Material Behavior
  • Advanced Neural Network Applications
  • 3D Shape Modeling and Analysis
  • Face recognition and analysis
  • Anomaly Detection Techniques and Applications
  • Advanced Steganography and Watermarking Techniques
  • Integrated Circuits and Semiconductor Failure Analysis
  • 3D Surveying and Cultural Heritage
  • Advanced Control Systems Optimization
  • Advanced Graph Neural Networks
  • Hydrology and Watershed Management Studies
  • Advanced Control Systems Design
  • Flood Risk Assessment and Management
  • Advanced Algorithms and Applications
  • Domain Adaptation and Few-Shot Learning
  • Metaheuristic Optimization Algorithms Research
  • Machine Learning in Healthcare
  • Advanced Image Processing Techniques
  • Biometric Identification and Security
  • Physical Unclonable Functions (PUFs) and Hardware Security
  • Acupuncture Treatment Research Studies

Second Artillery General Hospital of Chinese People's Liberation Army
2024-2025

University of Science and Technology of China
2020-2024

Jinan University
2024

Nanjing University of Chinese Medicine
2021

Suihua University
2018

Traditional watermarking algorithms have been extensively studied. As an important type of schemes, template-based approaches maintain a very high embedding rate. In such scheme, the message is often represented by some dedicatedly designed templates, and then process carried out additive operation with templates host image. To resist potential distortions, these need to contain special statistical features so that they can be successfully recovered at extracting side. But in existing...

10.1109/tcsvt.2020.3009349 article EN IEEE Transactions on Circuits and Systems for Video Technology 2020-07-15

Recent research shows deep neural networks are vulnerable to different types of attacks, such as adversarial attack, data poisoning attack and backdoor attack. Among them, is the most cunning one can occur in almost every stage learning pipeline. Therefore, has attracted lots interests from both academia industry. However, existing methods either visible or fragile some effortless pre-processing common transformations. To address these limitations, we propose a robust invisible called...

10.1109/tip.2022.3201472 article EN IEEE Transactions on Image Processing 2022-01-01

Adversary and invisibility are two fundamental but conflict characters of adversarial perturbations. Previous attacks on 3D point cloud recognition have often been criticized for their noticeable outliers, since they just involve an "implicit constrain" like global distance loss in the time-consuming optimization to limit generated noise. While is a highly structured data format, it hard constrain its perturbation with simple or metric properly. In this paper, we propose novel Point-Cloud...

10.1109/cvpr52688.2022.01490 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

We present Diversity-Aware Meta Visual Prompting (DAM-VP), an efficient and effective prompting method for transferring pre-trained models to downstream tasks with frozen backbone. A challenging issue in visual is that image datasets sometimes have a large data diversity whereas per-dataset generic prompt can hardly handle the complex distribution shift toward original pretraining properly. To address this issue, we propose dataset strategy whose initialization realized by Meta-prompt....

10.1109/cvpr52729.2023.01047 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Notwithstanding the prominent performance shown in various applications, point cloud recognition models have often suffered from natural corruptions and adversarial perturbations. In this paper, we delve into boosting general robustness of recognition, proposing Point-Cloud Contrastive Adversarial Training (PointCAT). The main intuition PointCAT is encouraging target model to narrow decision gap between clean clouds corrupted by devising feature-level constraints rather than logit-level...

10.1109/tip.2024.3372456 article EN IEEE Transactions on Image Processing 2024-01-01

Benefiting from the development of generative adversarial networks (GAN), facial manipulation has achieved significant progress in both academia and industry recently. It inspires an increasing number entertainment applications but also incurs severe threats to individual privacy even political security meanwhile. To mitigate such risks, many countermeasures have been proposed. However, great majority methods are designed a passive manner, which is detect whether images or videos tampered...

10.1609/aaai.v35i2.16254 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Recent advancements in image relighting models, driven by large-scale datasets and pre-trained diffusion have enabled the imposition of consistent lighting. However, video still lags, primarily due to excessive training costs scarcity diverse, high-quality datasets. A simple application models on a frame-by-frame basis leads several issues: lighting source inconsistency relighted appearance inconsistency, resulting flickers generated videos. In this work, we propose Light-A-Video,...

10.48550/arxiv.2502.08590 preprint EN arXiv (Cornell University) 2025-02-12

10.1109/icassp49660.2025.10887682 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

10.1109/itoec63606.2025.10967898 article EN 2018 IEEE 4th Information Technology and Mechatronics Engineering Conference (ITOEC) 2025-03-14

Recent multimodal large language models (MLLMs) have demonstrated significant potential in open-ended conversation, generating more accurate and personalized responses. However, their abilities to memorize, recall, reason sustained interactions within real-world scenarios remain underexplored. This paper introduces MMRC, a Multi-Modal Real-world Conversation benchmark for evaluating six core of MLLMs: information extraction, multi-turn reasoning, update, image management, memory answer...

10.48550/arxiv.2502.11903 preprint EN arXiv (Cornell University) 2025-02-17

In this paper, we investigate the adversarial robustness of vision transformers that are equipped with BERT pretraining (e.g., BEiT, MAE). A surprising observation is MAE has significantly worse than other methods. This drives us to rethink basic differences between these methods and how affect against perturbations. Our empirical analysis reveals highly related reconstruction target, i.e., predicting raw pixels masked image patches will degrade more model semantic context, since it guides...

10.1109/iccv51070.2023.00154 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Deep 3D point cloud models are sensitive to adversarial attacks, which poses threats safety-critical applications such as autonomous driving. Robust training and defend-by-denoising typical strategies for defending perturbations. However, they either induce massive computational overhead or rely heavily upon specified priors, limiting generalized robustness against attacks of all kinds. To remedy it, this paper introduces a novel distortion-aware defense framework that can rebuild the...

10.1145/3581783.3612018 article EN 2023-10-26

This work discusses the performance assessment and optimization of a class control systems under load disturbance. Different from stochastic evaluation method which requires relevant information to identify models, paper summarizes applies deterministic according two indicators, namely Idle index (II) Area (AI). Applying evaluate optimize loop in real time by collecting operation data system online, will guide help factory operators maintain normal system. The studied this is applied project...

10.23919/chicc.2018.8483441 article EN 2018-07-01

With the deterioration of climate, phenomenon rain-induced flooding has become frequent. To mitigate its impact, recent works adopt convolutional neural network or variants to predict floods. However, these methods directly force model reconstruct raw pixels flood images through a global constraint, overlooking underlying information contained in terrain features and rainfall patterns. address this, we present novel framework for precise map prediction, which incorporates hierarchical...

10.1109/icip49359.2023.10222894 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2023-09-11

Hallucination, posed as a pervasive challenge of multi-modal large language models (MLLMs), has significantly impeded their real-world usage that demands precise judgment. Existing methods mitigate this issue with either training specific designed data or inferencing external knowledge from other sources, incurring inevitable additional costs. In paper, we present OPERA, novel MLLM decoding method grounded in an Over-trust Penalty and Retrospection-Allocation strategy, serving nearly free...

10.48550/arxiv.2311.17911 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Despite the success of diffusion-based customization methods on visual content creation, increasing concerns have been raised about such techniques from both privacy and political perspectives. To tackle this issue, several anti-customization proposed in very recent months, predominantly grounded adversarial attacks. Unfortunately, most these adopt straightforward designs, as end-to-end optimization with a focus adversarially maximizing original training loss, thereby neglecting nuanced...

10.48550/arxiv.2312.07865 preprint EN other-oa arXiv (Cornell University) 2023-01-01

In this paper, aiming at the problems of slow estimation speed and low precision traditional fractional-order system (FOS) parameter method, an improved Archimedes optimization algorithm (IAOA) is proposed to calculate optimal value. By establishing model cost function, problem formulated as problem. As opposed (AOA), IAOA introduces three improvements: leadership behavior, levy flight behavior a new adaptive strategy. This paper verifies performance by selecting 10 classic test functions....

10.1142/s0129183124501973 article EN International Journal of Modern Physics C 2024-06-29

We present the Modality Integration Rate (MIR), an effective, robust, and generalized metric to indicate multi-modal pre-training quality of Large Vision Language Models (LVLMs). Large-scale plays a critical role in building capable LVLMs, while evaluating its training without costly supervised fine-tuning stage is under-explored. Loss, perplexity, in-context evaluation results are commonly used metrics for (LLMs), we observed that these less indicative when aligning well-trained LLM with...

10.48550/arxiv.2410.07167 preprint EN arXiv (Cornell University) 2024-10-09

In large vision-language models (LVLMs), images serve as inputs that carry a wealth of information. As the idiom "A picture is worth thousand words" implies, representing single image in current LVLMs can require hundreds or even thousands tokens. This results significant computational costs, which grow quadratically input resolution increases, thereby severely impacting efficiency both training and inference. Previous approaches have attempted to reduce number tokens either before within...

10.48550/arxiv.2410.17247 preprint EN arXiv (Cornell University) 2024-10-22

Benefiting from the development of generative adversarial networks (GAN), facial manipulation has achieved significant progress in both academia and industry recently. It inspires an increasing number entertainment applications but also incurs severe threats to individual privacy even political security meanwhile. To mitigate such risks, many countermeasures have been proposed. However, great majority methods are designed a passive manner, which is detect whether images or videos tampered...

10.48550/arxiv.2112.10098 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Parameter estimation is important in the study of control and synchronization fractional-order nonlinear systems (FONSs). This paper proposes an improved Sparrow Search Algorithm (ISSA) for parameter problem FONSs. The algorithm improves population initialization, position update method discoverers warning sparrows based on (SSA), simulation experiment financial system L conducted to demonstrate this method. experimental results show that proposed ISSA superior SSA, Particle Swarm...

10.1142/s0129183124501316 article EN International Journal of Modern Physics C 2024-03-02
Coming Soon ...