NFDI4DS | UHH-SEMS - Publication Details

Jindong Gu

ORCID: 0009-0000-0574-0129

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5055994909

Research Areas

Adversarial Robustness in Machine Learning
Multimodal Machine Learning Applications
Advanced Neural Network Applications
Domain Adaptation and Few-Shot Learning
Anomaly Detection Techniques and Applications
Explainable Artificial Intelligence (XAI)
Topic Modeling
Natural Language Processing Techniques
Machine Learning and Data Classification
Advanced Image and Video Retrieval Techniques
Generative Adversarial Networks and Image Synthesis
Video Analysis and Summarization
Privacy-Preserving Technologies in Data
COVID-19 diagnosis using AI
Music and Audio Processing
Digital Media Forensic Detection
AI in cancer detection
Industrial Vision Systems and Defect Detection
Advanced Steganography and Watermarking Techniques
Human Pose and Action Recognition
Speech and dialogue systems
Advanced Malware Detection Techniques
Hate Speech and Cyberbullying Detection
Misinformation and Its Impacts
Digital and Cyber Forensics

University of Oxford
2023-2024

Ludwig-Maximilians-Universität München
2018-2022

Institut für Urheber- und Medienrecht
2021-2022

Siemens (Germany)
2019-2021

A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models

OPENALEX - Publications

Jindong Gu Zhen Han Shuo Chen Ahmad Beirami Bailan He and 5 more

Prompt engineering is a technique that involves augmenting large pre-trained model with task-specific hints, known as prompts, to adapt the new tasks. Prompts can be created manually natural language instructions or generated automatically either vector representations. enables ability perform predictions based solely on prompts without updating parameters, and easier application of models in real-world In past years, has been well-studied processing. Recently, it also intensively studied...

10.48550/arxiv.2307.12980 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Backdoor Defense via Adaptively Splitting Poisoned Dataset

OPENALEX - Publications

Kuofeng Gao Yang Bai Jindong Gu Yong Yang Shu‐Tao Xia

Backdoor defenses have been studied to alleviate the threat of deep neural networks (DNNs) being backdoor attacked and thus maliciously altered. Since DNNs usually adopt some external training data from an untrusted third party, a robust defense strategy during stage is importance. We argue that core training-time select poisoned samples handle them properly. In this work, we summarize unified framework as splitting dataset into two pools. Under our framework, propose adaptively...

10.1109/cvpr52729.2023.00390 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning

OPENALEX - Publications

Haokun Chen Yao Zhang Denis Krompaß Jindong Gu Volker Tresp

Recently, foundation models have exhibited remarkable advancements in multi-modal learning. These models, equipped with millions (or billions) of parameters, typically require a substantial amount data for finetuning. However, collecting and centralizing training from diverse sectors becomes challenging due to distinct privacy regulations. Federated Learning (FL) emerges as promising solution, enabling multiple clients collaboratively train neural networks without their local data. To...

10.1609/aaai.v38i10.29007 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Improving the Robustness of Capsule Networks to Image Affine Transformations

OPENALEX - Publications

Jindong Gu Volker Tresp

Convolutional neural networks (CNNs) achieve translational invariance by using pooling operations. However, the operations do not preserve spatial relationships in learned representations. Hence, CNNs cannot extrapolate to various geometric transformations of inputs. Recently, Capsule Networks (CapsNets) have been proposed tackle this problem. In CapsNets, each entity is represented a vector and routed high-level representations dynamic routing algorithm. CapsNets shown be more robust than...

10.1109/cvpr42600.2020.00731 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

FRAug: Tackling Federated Learning with Non-IID Features via Representation Augmentation

OPENALEX - Publications

Haokun Chen Ahmed Frikha Denis Krompaß Jindong Gu Volker Tresp

Federated Learning (FL) is a decentralized machine learning paradigm, in which multiple clients collaboratively train neural networks without centralizing their local data, and hence preserve data privacy. However, real-world FL applications usually encounter challenges arising from distribution shifts across the datasets of individual clients. These may drift global model aggregation or result convergence to deflected optimum. While existing efforts have addressed label space, an equally...

10.1109/iccv51070.2023.00447 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Revisiting and Exploring Efficient Fast Adversarial Training via LAW: Lipschitz Regularization and Auto Weight Averaging

OPENALEX - Publications

Xiaojun Jia Yuefeng Chen Xiaofeng Mao Ranjie Duan Jindong Gu and 4 more

10.1109/tifs.2024.3420128 article EN IEEE Transactions on Information Forensics and Security 2024-01-01

Fast Propagation Is Better: Accelerating Single-Step Adversarial Training via Sampling Subnetworks

OPENALEX - Publications

Xiaojun Jia Jianshu Li Jindong Gu Yang Bai Xiaochun Cao

Adversarial training has shown promise in building robust models against adversarial examples. A major drawback of is the computational overhead introduced by generation To overcome this limitation, based on single-step attacks been explored. Previous work improves from different perspectives, e.g., sample initialization, loss regularization, and strategy. Almost all them treat underlying model as a black box. In work, we propose to exploit interior blocks improve efficiency. Specifically,...

10.1109/tifs.2024.3377004 article EN IEEE Transactions on Information Forensics and Security 2024-01-01

Tune In, Act Up: Exploring the Impact of Audio Modality-Specific Edits on Large Audio Language Models in Jailbreak

OPENALEX - Publications

Erjia Xiao Hao Cheng Jing Shao Jinhao Duan Kaidi Xu and 3 more

Large Language Models (LLMs) demonstrate remarkable zero-shot performance across various natural language processing tasks. The integration of multimodal encoders extends their capabilities, enabling the development Multimodal that process vision, audio, and text. However, these capabilities also raise significant security concerns, as models can be manipulated to generate harmful or inappropriate content through jailbreak. While extensive research explores impact modality-specific input...

10.48550/arxiv.2501.13772 preprint EN arXiv (Cornell University) 2025-01-23

Safety at Scale: A Comprehensive Survey of Large Model Safety

OPENALEX - Publications

Xingjun Ma Yifeng Gao Yixu Wang Ruofan Wang Xin Wang and 40 more

The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape Artificial Intelligence (AI). These models are now foundational to a wide range applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, scientific discovery. However, widespread deployment also exposes them significant safety risks, raising concerns about...

10.48550/arxiv.2502.05206 preprint EN arXiv (Cornell University) 2025-02-02

Text-Guided Camouflaged Object Detection

OPENALEX - Publications

Z. Y. Chen Y. Y. Xue Zhijiang Li Philip Torr Jindong Gu

10.2139/ssrn.5165330 preprint EN 2025-01-01

Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?

OPENALEX - Publications

Shuo Chen Zhen Han Bailan He Jianzhe Liu Mark Buckley and 4 more

10.1109/wacv61041.2025.00585 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025-02-26

CL-Cross VQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering

OPENALEX - Publications

Yao Zhang Haokun Chen Ahmed Frikha Denis Krompaß Gengyuan Zhang and 2 more

10.1109/wacv61041.2025.00611 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025-02-26

Capsule Network is Not More Robust than Convolutional Network

OPENALEX - Publications

Jindong Gu Volker Tresp Han Hu

The Capsule Network is widely believed to be more robust than Convolutional Networks. However, there are no comprehensive comparisons between these two networks, and it also unknown which components in the CapsNet affect its robustness. In this paper, we first carefully examine special designs that differ from of a ConvNet commonly used for image classification. examination reveals five major new/different CapsNet: transformation process, dynamic routing layer, squashing function, marginal...

10.1109/cvpr46437.2021.01408 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Interpretable Graph Capsule Networks for Object Recognition

OPENALEX - Publications

Jindong Gu

Capsule Networks, as alternatives to Convolutional Neural have been proposed recognize objects from images. The current literature demonstrates many advantages of CapsNets over CNNs. However, how create explanations for individual classifications has not well explored. widely used saliency methods are mainly explaining CNN-based classifications; they map by combining activation values and the corresponding gradients, e.g., Grad-CAM. These require a specific architecture underlying...

10.1609/aaai.v35i2.16237 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Does Few-Shot Learning Suffer from Backdoor Attacks?

OPENALEX - Publications

Xinwei Liu Xiaojun Jia Jindong Gu Xun Yuan Siyuan Liang and 1 more

The field of few-shot learning (FSL) has shown promising results in scenarios where training data is limited, but its vulnerability to backdoor attacks remains largely unexplored. We first explore this topic by evaluating the performance existing attack methods on scenarios. Unlike standard supervised learning, failed perform an effective FSL due two main issues. Firstly, model tends overfit either benign features or trigger features, causing a tough trade-off between success rate and...

10.1609/aaai.v38i18.29965 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation

OPENALEX - Publications

Hang Li Chengzhi Shen Philip Torr Volker Tresp Jindong Gu

10.1109/cvpr52733.2024.01141 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Saliency Methods for Explaining Adversarial Attacks

OPENALEX - Publications

Jindong Gu Volker Tresp

The classification decisions of neural networks can be misled by small imperceptible perturbations. This work aims to explain the classifications using saliency methods. idea behind methods is creating so-called maps. Unfortunately, a number recent publications have shown that many proposed do not provide insightful explanations. A prominent example Guided Backpropagation (GuidedBP), which simply performs (partial) image recovery. However, our numerical analysis shows maps created GuidedBP...

10.48550/arxiv.1908.08413 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Search for Better Students to Learn Distilled Knowledge

OPENALEX - Publications

Jindong Gu Volker Tresp

Knowledge Distillation, as a model compression technique, has received great attention. The knowledge of well-performed teacher is distilled to student with small architecture. architecture the often chosen be similar their teacher's, fewer layers or channels, both. However, even same number FLOPs parameters, students different can achieve generalization ability. configuration requires intensive network engineering. In this work, instead designing good manually, we propose search for optimal...

10.48550/arxiv.2001.11612 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images

OPENALEX - Publications

Kuofeng Gao Yang Bai Jindong Gu Shu‐Tao Xia Philip H. S. Torr and 2 more

Large vision-language models (VLMs) such as GPT-4 have achieved exceptional performance across various multi-modal tasks. However, the deployment of VLMs necessitates substantial energy consumption and computational resources. Once attackers maliciously induce high latency time (energy-latency cost) during inference VLMs, it will exhaust In this paper, we explore attack surface about availability aim to energy-latency cost VLMs. We find that can be manipulated by maximizing length generated...

10.48550/arxiv.2401.11170 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Coming Soon ...