Brendan Dolan-Gavitt

ORCID: 0000-0002-8867-4282
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Malware Detection Techniques
  • Security and Verification in Computing
  • Software Testing and Debugging Techniques
  • Software Engineering Research
  • Software Reliability and Analysis Research
  • Adversarial Robustness in Machine Learning
  • Anomaly Detection Techniques and Applications
  • Digital and Cyber Forensics
  • Information and Cyber Security
  • Advanced Data Storage Technologies
  • Parallel Computing and Optimization Techniques
  • Privacy, Security, and Data Protection
  • Network Security and Intrusion Detection
  • Ferroelectric and Negative Capacitance Devices
  • Digital Media Forensic Detection
  • Topic Modeling
  • BIM and Construction Integration
  • Advanced Data Processing Techniques
  • Model-Driven Software Engineering Techniques
  • Manufacturing Process and Optimization
  • Natural Language Processing Techniques
  • Web Application Security Vulnerabilities
  • Machine Learning in Materials Science
  • Domain Adaptation and Few-Shot Learning
  • Embedded Systems Design Techniques

New York University
2016-2025

Institute of Electrical and Electronics Engineers
2019

Regional Municipality of Niagara
2019

IEEE Computer Society
2019

Georgia Institute of Technology
2009-2014

Mitre (United States)
2007-2008

Deep learning-based techniques have achieved state-of-the-art performance on a wide variety of recognition and classification tasks. However, these networks are typically computationally expensive to train, requiring weeks computation many GPUs; as result, users outsource the training procedure cloud or rely pre-trained models that then fine-tuned for specific task. In this paper we show outsourced introduces new security risks: an adversary can create maliciously trained network (a...

10.48550/arxiv.1708.06733 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Deep learning-based techniques have achieved state-of-the-art performance on a wide variety of recognition and classification tasks. However, these networks are typically computationally expensive to train, requiring weeks computation many GPUs; as result, users outsource the training procedure cloud or rely pre-trained models that then fine-tuned for specific task. In this paper, we show outsourced introduces new security risks: an adversary can create maliciously trained network (a...

10.1109/access.2019.2909068 article EN cc-by-nc-nd IEEE Access 2019-01-01

Introspection has featured prominently in many recent security solutions, such as virtual machine-based intrusion detection, forensic memory analysis, and low-artifact malware analysis. Widespread adoption of these approaches, however, been hampered by the semantic gap: order to extract meaningful information about current state a machine, detailed knowledge guest operating system's inner workings is required. In this paper, we present novel approach for automatically creating introspection...

10.1109/sp.2011.11 article EN IEEE Symposium on Security and Privacy 2011-05-01

Work on automating vulnerability discovery has long been hampered by a shortage of ground-truth corpora with which to evaluate tools and techniques. This lack ground truth prevents authors users alike from being able measure such fundamental quantities as miss false alarm rates. In this paper, we present LAVA, novel dynamic taint analysis-based technique for producing quickly automatically injecting large numbers realistic bugs into program source code. Every LAVA bug is accompanied an input...

10.1109/sp.2016.15 article EN 2022 IEEE Symposium on Security and Privacy (SP) 2016-05-01

There is burgeoning interest in designing AI-based systems to assist humans computing systems, including tools that automatically generate computer code. The most notable of these comes the form first self-described 'AI pair programmer', GitHub Copilot, which a language model trained over open-source However, code often contains bugs—and so, given vast quantity unvetted Copilot has processed, it certain will have learned from exploitable, buggy This raises concerns on security Copilot's...

10.1109/sp46214.2022.9833571 article EN 2022 IEEE Symposium on Security and Privacy (SP) 2022-05-01

The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities fast large-batch inference enabled by multi-query attention. StarCoderBase is trained 1 trillion tokens sourced from Stack, a large collection permissively licensed GitHub repositories inspection tools opt-out process. We fine-tuned 35B Python...

10.48550/arxiv.2305.06161 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

Human developers can produce code with cybersecurity bugs. Can emerging 'smart' completion tools help repair those bugs? In this work, we examine the use of large language models (LLMs) for (such as OpenAI's Codex and AI21's Jurassic J-1) zero-shot vulnerability repair. We investigate challenges in design prompts that coax LLMs into generating repaired versions insecure code. This is difficult due to numerous ways phrase key information— both semantically syntactically—with natural...

10.1109/sp46215.2023.10179420 article EN 2022 IEEE Symposium on Security and Privacy (SP) 2023-05-01

Human developers can produce code with cybersecurity bugs. Can emerging 'smart' completion tools help repair those bugs? In this work, we examine the use of large language models (LLMs) for (such as OpenAI's Codex and AI21's Jurassic J-1) zero-shot vulnerability repair. We investigate challenges in design prompts that coax LLMs into generating repaired versions insecure code. This is difficult due to numerous ways phrase key information— both semantically syntactically—with natural...

10.1109/sp46215.2023.10179324 article EN 2022 IEEE Symposium on Security and Privacy (SP) 2023-05-01

Automating hardware design could obviate a signif-icant amount of human error from the engineering process and lead to fewer errors. Verilog is popular description language model digital systems, thus generating code critical first step. Emerging large models (LLMs) are able write high-quality in other programming languages. In this paper, we characterize ability LLMs generate useful Verilog. For this, fine-tune pre-trained on datasets collected GitHub textbooks. We construct an evaluation...

10.23919/date56975.2023.10137086 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2023-04-01

In this study, we explore the capability of Large Language Models (LLMs) to automate hardware design by automatically completing partial Verilog code, a common language for designing and modeling digital systems. We fine-tune pre-existing LLMs on datasets compiled from GitHub textbooks. evaluate functional correctness generated code using specially designed test suite, featuring custom problem set testing benches. Here, our fine-tuned open-source CodeGen-16B model outperforms commercial...

10.1145/3643681 article EN mit ACM Transactions on Design Automation of Electronic Systems 2024-02-09

There is burgeoning interest in designing AI-based systems to assist humans computing systems, including tools that automatically generate computer code. The most notable of these comes the form first self-described “AI pair programmer,” GitHub Copilot, which a language model trained over open-source However, code often contains bugs—and so, given vast quantity unvetted Copilot has processed, it certain will have learned from exploitable, buggy This raises concerns on security Copilot’s...

10.1145/3610721 article EN Communications of the ACM 2025-01-20

Kernel-mode rootkits hide objects such as processes and threads using a technique known Direct Kernel Object Manipulation (DKOM). Many forensic analysis tools attempt to detect these hidden by scanning kernel memory with handmade signatures; however, signatures are brittle rely on non-essential features of data structures, making them easy evade. In this paper, we present an automated mechanism for generating structures show that robust: attempts evade the signature modifying structure...

10.1145/1653662.1653730 article EN 2009-11-09

We present PANDA, an open-source tool that has been purpose-built to support whole system reverse engineering. It is built upon the QEMU emulator, and so analyses have access all code executing in guest data. PANDA adds ability record replay executions, enabling iterative, deep, analyses. Further, log files are compact shareable, allowing for repeatable experiments. A nine billion instruction boot of FreeBSD, e.g., represented by only a few hundred MB. leverages QEMU's thirteen different CPU...

10.1145/2843859.2843867 article EN 2015-12-08

The security of computer systems typically relies on a hardware root trust. As vulnerabilities in can have severe implications system, there is need for techniques to support verification activities. Assertion-based popular technique that involves capturing design intent set assertions be used formal or testing-based checking. However, writing security-centric challenging task. In this work, we investigate the use emerging large language models (LLMs) code generation assertion security,...

10.1109/tifs.2024.3372809 article EN IEEE Transactions on Information Forensics and Security 2024-01-01

The security of computer systems typically relies on a hardware root trust. As vulnerabilities in can have severe implications system, there is need for techniques to support verification activities. Assertion-based popular technique that involves capturing design intent set assertions be used formal or testing-based checking. However, writing security-centric challenging task. In this work, we investigate the use emerging large language models (LLMs) code generation assertion security,...

10.48550/arxiv.2306.14027 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

This paper describes the structure of Windows registry as it is stored in physical memory. We present tools and techniques that can be used to extract this data directly from memory dumps. also provide guidelines aid investigators experimentally demonstrate value our techniques. Finally, we describe a compelling attack modifies cached version without altering on-disk version. While would undetectable with conventional analysis techniques, such malicious modifications are easily detectable by...

10.1016/j.diin.2008.05.003 article EN cc-by-nc-nd Digital Investigation 2008-05-27

This paper describes the use of Virtual Address Descriptor (VAD) tree structure in Windows memory dumps to help guide forensic analysis memory. We describe how locate and parse structure, show its value breaking up physical into more manageable semantically meaningful units than can be obtained by simply walking page directory for process. Several tools display information about VAD dump regions it will also presented.

10.1016/j.diin.2007.06.008 article EN cc-by-nc-nd Digital Investigation 2007-06-18

The ability to introspect into the behavior of software at runtime is crucial for many security-related tasks, such as virtual machine-based intrusion detection and low-artifact malware analysis. Although some progress has been made in this task by automatically creating programs that can passively retrieve kernel-level information, two key challenges remain. First, it currently difficult extract useful information from user-level applications, web browsers. Second, discovering points within...

10.1145/2508859.2516697 article EN 2013-01-01

In spite of decades research in bug detection tools, there is a surprising dearth ground-truth corpora that can be used to evaluate the efficacy such tools. Recently, systems as LAVA and EvilCoder have been proposed automatically inject bugs into software quickly generate large corpora, but created so far differ from naturally occurring number ways. this work, we propose new automated injection system, Apocalypse, uses formal techniques—symbolic execution, constraint-based program synthesis...

10.1145/3236024.3236084 article EN 2018-10-26

Large Language Models (LLMs) such as OpenAI Codex are increasingly being used AI-based coding assistants. Understanding the impact of these tools on developers' code is paramount, especially recent work showed that LLMs may suggest cybersecurity vulnerabilities. We conduct a security-driven user study (N=58) to assess written by student programmers when assisted LLMs. Given potential severity low-level bugs well their relative frequency in real-world projects, we tasked participants with...

10.48550/arxiv.2208.09727 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Deep neural networks (DNNs) provide excellent performance across a wide range of classification tasks, but their training requires high computational resources and is often outsourced to third parties. Recent work has shown that introduces the risk malicious trainer will return backdoored DNN behaves normally on most inputs causes targeted misclassifications or degrades accuracy network when trigger known only attacker present. In this paper, we first effective defenses against backdoor...

10.48550/arxiv.1805.12185 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Closely monitoring the behavior of a software system during its execution enables developers and analysts to observe, ultimately understand, how it works. This kind dynamic analysis can be instrumental reverse engineering, vulnerability discovery, exploit development, debugging. While these analyses are typically well-supported for homogeneous desktop platforms (e.g., x86 PCs), they rarely applied in heterogeneous world embedded systems. One approach enable systems is move stacks from...

10.1145/3433210.3453093 article EN 2021-05-24

Large Language Models (LLMs) have been used in cybersecurity many ways, including their recent use as intelligent agent systems for autonomous security analysis. Capture the Flag (CTF) challenges serve benchmarks assessing automated task-planning abilities of LLM agents across various skill sets. Early attempts to apply LLMs solving CTF relied on single-agent systems, where feedback was restricted a single reasoning-action loop. This approach proved inadequate handling complex tasks. Drawing...

10.48550/arxiv.2502.10931 preprint EN arXiv (Cornell University) 2025-02-15
Coming Soon ...