- Adversarial Robustness in Machine Learning
- Multimodal Machine Learning Applications
- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Anomaly Detection Techniques and Applications
- Explainable Artificial Intelligence (XAI)
- Topic Modeling
- Natural Language Processing Techniques
- Machine Learning and Data Classification
- Advanced Image and Video Retrieval Techniques
- Generative Adversarial Networks and Image Synthesis
- Video Analysis and Summarization
- Privacy-Preserving Technologies in Data
- COVID-19 diagnosis using AI
- Music and Audio Processing
- Digital Media Forensic Detection
- AI in cancer detection
- Industrial Vision Systems and Defect Detection
- Advanced Steganography and Watermarking Techniques
- Human Pose and Action Recognition
- Speech and dialogue systems
- Advanced Malware Detection Techniques
- Hate Speech and Cyberbullying Detection
- Misinformation and Its Impacts
- Digital and Cyber Forensics
University of Oxford
2023-2024
Ludwig-Maximilians-Universität München
2018-2022
Institut für Urheber- und Medienrecht
2021-2022
Siemens (Germany)
2019-2021
Prompt engineering is a technique that involves augmenting large pre-trained model with task-specific hints, known as prompts, to adapt the new tasks. Prompts can be created manually natural language instructions or generated automatically either vector representations. enables ability perform predictions based solely on prompts without updating parameters, and easier application of models in real-world In past years, has been well-studied processing. Recently, it also intensively studied...
Backdoor defenses have been studied to alleviate the threat of deep neural networks (DNNs) being backdoor attacked and thus maliciously altered. Since DNNs usually adopt some external training data from an untrusted third party, a robust defense strategy during stage is importance. We argue that core training-time select poisoned samples handle them properly. In this work, we summarize unified framework as splitting dataset into two pools. Under our framework, propose adaptively...
Recently, foundation models have exhibited remarkable advancements in multi-modal learning. These models, equipped with millions (or billions) of parameters, typically require a substantial amount data for finetuning. However, collecting and centralizing training from diverse sectors becomes challenging due to distinct privacy regulations. Federated Learning (FL) emerges as promising solution, enabling multiple clients collaboratively train neural networks without their local data. To...
Convolutional neural networks (CNNs) achieve translational invariance by using pooling operations. However, the operations do not preserve spatial relationships in learned representations. Hence, CNNs cannot extrapolate to various geometric transformations of inputs. Recently, Capsule Networks (CapsNets) have been proposed tackle this problem. In CapsNets, each entity is represented a vector and routed high-level representations dynamic routing algorithm. CapsNets shown be more robust than...
Federated Learning (FL) is a decentralized machine learning paradigm, in which multiple clients collaboratively train neural networks without centralizing their local data, and hence preserve data privacy. However, real-world FL applications usually encounter challenges arising from distribution shifts across the datasets of individual clients. These may drift global model aggregation or result convergence to deflected optimum. While existing efforts have addressed label space, an equally...
Adversarial training has shown promise in building robust models against adversarial examples. A major drawback of is the computational overhead introduced by generation To overcome this limitation, based on single-step attacks been explored. Previous work improves from different perspectives, e.g., sample initialization, loss regularization, and strategy. Almost all them treat underlying model as a black box. In work, we propose to exploit interior blocks improve efficiency. Specifically,...
Large Language Models (LLMs) demonstrate remarkable zero-shot performance across various natural language processing tasks. The integration of multimodal encoders extends their capabilities, enabling the development Multimodal that process vision, audio, and text. However, these capabilities also raise significant security concerns, as models can be manipulated to generate harmful or inappropriate content through jailbreak. While extensive research explores impact modality-specific input...
The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape Artificial Intelligence (AI). These models are now foundational to a wide range applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, scientific discovery. However, widespread deployment also exposes them significant safety risks, raising concerns about...
The Capsule Network is widely believed to be more robust than Convolutional Networks. However, there are no comprehensive comparisons between these two networks, and it also unknown which components in the CapsNet affect its robustness. In this paper, we first carefully examine special designs that differ from of a ConvNet commonly used for image classification. examination reveals five major new/different CapsNet: transformation process, dynamic routing layer, squashing function, marginal...
Capsule Networks, as alternatives to Convolutional Neural have been proposed recognize objects from images. The current literature demonstrates many advantages of CapsNets over CNNs. However, how create explanations for individual classifications has not well explored. widely used saliency methods are mainly explaining CNN-based classifications; they map by combining activation values and the corresponding gradients, e.g., Grad-CAM. These require a specific architecture underlying...
The field of few-shot learning (FSL) has shown promising results in scenarios where training data is limited, but its vulnerability to backdoor attacks remains largely unexplored. We first explore this topic by evaluating the performance existing attack methods on scenarios. Unlike standard supervised learning, failed perform an effective FSL due two main issues. Firstly, model tends overfit either benign features or trigger features, causing a tough trade-off between success rate and...
The classification decisions of neural networks can be misled by small imperceptible perturbations. This work aims to explain the classifications using saliency methods. idea behind methods is creating so-called maps. Unfortunately, a number recent publications have shown that many proposed do not provide insightful explanations. A prominent example Guided Backpropagation (GuidedBP), which simply performs (partial) image recovery. However, our numerical analysis shows maps created GuidedBP...
Knowledge Distillation, as a model compression technique, has received great attention. The knowledge of well-performed teacher is distilled to student with small architecture. architecture the often chosen be similar their teacher's, fewer layers or channels, both. However, even same number FLOPs parameters, students different can achieve generalization ability. configuration requires intensive network engineering. In this work, instead designing good manually, we propose search for optimal...
Large vision-language models (VLMs) such as GPT-4 have achieved exceptional performance across various multi-modal tasks. However, the deployment of VLMs necessitates substantial energy consumption and computational resources. Once attackers maliciously induce high latency time (energy-latency cost) during inference VLMs, it will exhaust In this paper, we explore attack surface about availability aim to energy-latency cost VLMs. We find that can be manipulated by maximizing length generated...