Yang Zhang

ORCID: 0000-0003-3612-7348
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Adversarial Robustness in Machine Learning
  • Generative Adversarial Networks and Image Synthesis
  • Hate Speech and Cyberbullying Detection
  • Digital Media Forensic Detection
  • Anomaly Detection Techniques and Applications
  • Topic Modeling
  • Smart Grid Security and Resilience
  • Privacy-Preserving Technologies in Data
  • Domain Adaptation and Few-Shot Learning
  • Ethics and Social Impacts of AI
  • Explainable Artificial Intelligence (XAI)
  • Advanced Malware Detection Techniques
  • Auction Theory and Applications
  • Natural Language Processing Techniques
  • Multimodal Machine Learning Applications
  • Network Security and Intrusion Detection
  • Information and Cyber Security
  • Blockchain Technology Applications and Security
  • Advanced Neural Network Applications
  • Cryptography and Data Security
  • Cinema and Media Studies
  • Spam and Phishing Detection

Helmholtz Center for Information Security
2023-2024

Text-to-image generation models that generate images based on prompt descriptions have attracted an increasing amount of attention during the past few months. Despite their encouraging performance, these raise concerns about misuse generated fake images. To tackle this problem, we pioneer a systematic study detection and attribution by text-to-image models. Concretely, first build machine learning classifier to detect various We then attribute source models, such model owners can be held...

10.1145/3576915.3616588 article EN Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security 2023-11-15

The misuse of large language models (LLMs) has drawn significant attention from the general public and LLM vendors. One particular type adversarial prompt, known as jailbreak emerged main attack vector to bypass safeguards elicit harmful content LLMs. In this paper, employing our new framework JailbreakHub, we conduct a comprehensive analysis 1,405 prompts spanning December 2022 2023. We identify 131 communities discover unique characteristics their major strategies, such prompt injection...

10.1145/3658644.3670388 article EN 2024-12-02

State-of-the-art Text-to-Image models like Stable Diffusion and DALLE\cdot2 are revolutionizing how people generate visual content. At the same time, society has serious concerns about adversaries can exploit such to problematic or unsafe images. In this work, we focus on demystifying generation of images hateful memes from models. We first construct a typology consisting five categories (sexually explicit, violent, disturbing, hateful, political). Then, assess proportion generated by four...

10.1145/3576915.3616679 article EN Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security 2023-11-15

Nowadays, powerful large language models (LLMs) such as ChatGPT have demonstrated revolutionary power in a variety of natural processing (NLP) tasks text classification, sentiment analysis, translation, and question-answering. Consequently, the detection machine-generated texts (MGTs) is becoming increasingly crucial LLMs become more advanced prevalent. These ability to generate human-like language, making it challenging discern whether authored by human or machine. This raises concerns...

10.1145/3658644.3670344 article EN 2024-12-02

Task arithmetic in large-scale pre-trained models enables flexible adaptation to diverse downstream tasks without extensive re-training. By leveraging task vectors (TVs), users can perform modular updates through simple operations like addition and subtraction. However, this flexibility introduces new security vulnerabilities. In paper, we identify evaluate the susceptibility of TVs backdoor attacks, demonstrating how malicious actors exploit compromise model integrity. developing composite...

10.48550/arxiv.2501.02373 preprint EN arXiv (Cornell University) 2025-01-04

Large Language Models (LLMs) have raised increasing concerns about their misuse in generating hate speech. Among all the efforts to address this issue, speech detectors play a crucial role. However, effectiveness of different against LLM-generated remains largely unknown. In paper, we propose HateBench, framework for benchmarking on We first construct dataset 7,838 samples generated by six widely-used LLMs covering 34 identity groups, with meticulous annotations three labelers. then assess...

10.48550/arxiv.2501.16750 preprint EN arXiv (Cornell University) 2025-01-28

Most existing membership inference attacks (MIAs) utilize metrics (e.g., loss) calculated on the model's final state, while recent advanced leverage computed at various stages, including both intermediate and throughout model training. Nevertheless, these often process multiple states of metric independently, ignoring their time-dependent patterns. Consequently, they struggle to effectively distinguish between members non-members who exhibit similar values, particularly resulting in a high...

10.1145/3658644.3690335 article EN cc-by 2024-12-02

Image safety classifiers play an important role in identifying and mitigating the spread of unsafe images online (e.g., including violence, hateful rhetoric, etc.). At same time, with advent text-to-image models increasing concerns about AI models, developers are increasingly relying on image to safeguard their models. Yet, performance current remains unknown for real-world AI-generated images. To bridge this research gap, work, we propose UnsafeBench, a benchmarking framework that evaluates...

10.48550/arxiv.2405.03486 preprint EN arXiv (Cornell University) 2024-05-06

Adapting Large Language Models (LLMs) to specific tasks introduces concerns about computational efficiency, prompting an exploration of efficient methods such as In-Context Learning (ICL). However, the vulnerability ICL privacy attacks under realistic assumptions remains largely unexplored. In this work, we present first membership inference attack tailored for ICL, relying solely on generated texts without their associated probabilities. We propose four strategies various constrained...

10.1145/3658644.3690306 article EN 2024-12-02

Text-to-image models, such as Stable Diffusion (SD), undergo iterative updates to improve image quality and address concerns safety. Improvements in are straightforward assess. However, how model resolve existing whether they raise new questions remain unexplored. This study takes an initial step investigating the evolution of text-to-image models from perspectives safety, bias, authenticity. Our findings, centered on Diffusion, indicate that paint a mixed picture. While progressively reduce...

10.1145/3658644.3690288 article EN 2024-12-02

Fine-tuning pre-trained models for downstream tasks has led to a proliferation of open-sourced task-specific models. Recently, Model Merging (MM) emerged as an effective approach facilitate knowledge transfer among these independently fine-tuned MM directly combines multiple into merged model without additional training, and the resulting shows enhanced capabilities in tasks. Although provides great utility, it may come with security risks because adversary can exploit affect However, have...

10.1145/3658644.3690284 article EN cc-by 2024-12-02

With large AI systems and models (LAMs) playing an ever-growing role across diverse applications, their impact on the privacy cybersecurity of critical infrastructure has become a pressing concern. The LAMPS workshop is dedicated to tackling these emerging challenges, promoting dialogue cutting-edge developments ethical issues in safeguarding LAMs within contexts. Bringing together leading experts from around world, this will delve into complex risks posed by sectors. Attendees explore...

10.1145/3658644.3691335 article EN 2024-12-02

The text-to-image generation model has attracted significant interest from both academic and industrial communities. These models can generate the images based on given prompt descriptions. Their potent capabilities, while beneficial, also present risks. Previous efforts relied approach of training binary classifiers to detect generated fake images, which is inefficient, lacking in generalizability, non-robust. In this paper, we propose novel zero-shot detection method, called ZeroFake,...

10.1145/3658644.3690297 article EN 2024-12-02

Recent studies have shown that systems with limited resources like Metadata-private Messenger (MPM) suffer from side-channel attacks under resource allocation (RA). In the case of MPM, which is designed to keep identities and activities both callers callees private network adversaries, an attacker can compromise a victim's friends calling victim infer whether busy, breaks privacy guarantee MPM.

10.1145/3627106.3627181 article EN cc-by Annual Computer Security Applications Conference 2023-12-02
Coming Soon ...