- Software Testing and Debugging Techniques
- Software Engineering Research
- Software Reliability and Analysis Research
- Advanced Malware Detection Techniques
- Security and Verification in Computing
- Software System Performance and Reliability
- Manufacturing Process and Optimization
- Adversarial Robustness in Machine Learning
- Formal Methods in Verification
- Web Application Security Vulnerabilities
- Advanced Software Engineering Methodologies
- Machine Learning and Algorithms
- Information and Cyber Security
- Logic, programming, and type systems
- Cryptographic Implementations and Security
- Regional Development and Environment
- Medical Research and Treatments
- Evolution and Genetic Dynamics
- Peer-to-Peer Network Technologies
- Music and Audio Processing
- Species Distribution and Climate Change
- Machine Learning and Data Classification
- Real-Time Systems Scheduling
- Risk and Safety Analysis
- Teaching and Learning Programming
Max Planck Institute for Security and Privacy
2021-2025
Monash University
2018-2024
Association for Computing Machinery
2019-2021
Australian Regenerative Medicine Institute
2019-2020
National University of Singapore
2012-2018
Saarland University
2014-2015
Software (Germany)
2015
TU Dresden
2009
Existing Greybox Fuzzers (GF) cannot be effectively directed, for instance, towards problematic changes or patches, critical system calls dangerous locations, functions in the stack-trace of a reported vulnerability that we wish to reproduce. In this paper, introduce Directed Fuzzing (DGF) which generates inputs with objective reaching given set target program locations efficiently. We develop and evaluate simulated annealing-based power schedule gradually assigns more energy seeds are...
Coverage-based Greybox Fuzzing (CGF) is a random testing approach that requires no program analysis. A new test generated by slightly mutating seed input. If the exercises and interesting path, it added to set of seeds; otherwise, discarded. We observe most tests exercise same few "high-frequency" paths develop strategies explore significantly more with number gravitating towards low-frequency paths. explain challenges opportunities CGF using Markov chain model which specifies probability...
Coverage-based Greybox Fuzzing (CGF) is a random testing approach that requires no program analysis. A new test generated by slightly mutating seed input. If the exercises and interesting path, it added to set of seeds; otherwise, discarded. We observe most tests exercise same few "high-frequency" paths develop strategies explore significantly more with number gravitating towards low-frequency paths. explain challenges opportunities CGF using Markov chain model which specifies probability...
Server fuzzing is difficult. Unlike simple command-line tools, servers feature a massive state space that can be traversed effectively only with well-defined sequences of input messages. Valid are specified in protocol. In this paper, we present AFLNET, the first greybox fuzzer for protocol implementations. existing fuzzers, AFLNET takes mutational approach and uses state-feedback to guide process. seeded corpus recorded message exchanges between server an actual client. No specification or...
How to find security flaws in a protocol implementation without machine-readable specification of the protocol?Facing internet, implementations are particularly security-critical software systems where inputs must adhere specific structure and order that is often informally specified hundreds pages natural language (RFC).Without some version protocol, it difficult automatically generate valid test for its follow required order.It possible partially alleviate this challenge using mutational...
Research has produced many approaches to automatically locate, explain, and repair software bugs. But do these relate the way practitioners actually understand, fix bugs? To help answer this question, we have collected a dataset named DBGBENCH --- correct fault locations, bug diagnoses, patches of 27 real errors in open-source C projects that were consolidated from hundreds debugging sessions professional engineers. Moreover, shed light on entire process, constructing hypothesis submitting...
Many real-world programs take highly structured and very complex inputs. The automated testing of such is non-trivial. If the test input does not adhere to a specific file format, program returns parser error. For symbolic execution-based whitebox fuzzing corresponding error handling code becomes significant time sink. Too much spent in exploring too many paths leading trivial errors. Naturally, better functional part where failure with valid exposes deep real bugs program. In this paper, we...
Android testing tools generate sequences of input events to exercise the state space app-under-test. Existing search-based techniques systematically evolve a population event so as achieve certain objectives such maximal code coverage. The hope is that mutation fit leads generation even fitter sequences. However, evolution may be ineffective. Our key insight pertinent app states which contributed original sequence's fitness not reached by mutated sequence. path through truncated at point mutation.
In this paper, we take the fundamental perspective of fuzzing as a learning process. Suppose before fuzzing, know nothing about behaviors program P: What does it do? Executing first test input, learn how P behaves for input. next either observe same or discover new behavior. As such, each execution reveals "some amount" information P's behaviors. A classic measure is Shannon's entropy. Measuring entropy allows us to quantify much learned from generated input program. Within probabilistic...
Given a program where none of our fuzzers finds any bugs, how do we know which fuzzer is better? In practice, often look to code coverage as proxy measure effectiveness and consider the achieves more better one.
Intuitively we know, some software errors are more complex than others. If the error can be fixed by changing one faulty statement, it is a simple error. The substantial fix must be, consider
We present counterintuitive results for the scalability of fuzzing. Given same non-deterministic fuzzer, finding bugs linearly faster requires more machines. For instance, with twice machines, we can find all known in half time. Yet, time exponentially every new bug want to 24 hours, might need Similarly coverage. With cover code faster, but uncovered only faster. In other words, re-discovering vulnerabilities is cheap expensive. This holds even under simplifying assumption no...
What you change is what fuzz! In an empirical study of all fuzzer-generated bug reports in OSSFuzz, we found that four every five bugs have been introduced by recent code changes. That is, 77% 23k are regressions. For a newly added project, there usually initial burst new at 2-3 per day. However, after burst, and weeding out most the existing bugs, still get constant rate 3-4 week. The can only be explained increasing regression rate. Indeed, probability reported (i.e., could identify...
We study the relative efficiencies of random and systematic approaches to automated software testing. Using a simple but realistic set assumptions, we propose general model for testing define sampling strategies (R) (S <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> ) testing, where each is associated with cost: 1 c units time, respectively. The two most important goals are: (i) achieving in minimal time given degree confidence x...
A fundamental challenge of software testing is the statistically well-grounded extrapolation from program behaviors observed during testing. For instance, a security researcher who has run fuzzer for week currently no means (1) to estimate total number feasible branches, given that only fraction been covered so far; (2) additional time required cover 10% more branches (or coverage achieved in one day, respectively); or (3) assess residual risk vulnerability exists when discovered. Failing...
While fuzzing can be very costly, it has proven to a fundamental technique in uncovering bugs (often security related) many applications. A recent study on bug reports from OSS-Fuzz observed that code changes are responsible for 77% of all reported bugs, stressing the importance continuous testing. With increased adoption CI/CD practices software development, is only natural look effective ways incorporating into In this paper, we effectiveness fuzz testing pipelines with focus related and...
Large Language Models (LLMs) have shown tremendous promise in automated software engineering. In this paper, we investigate the opportunities of LLMs for automatic regression test generation programs that take highly structured, human-readable inputs, such as XML parsers or JavaScript interpreters. Concretely, explore following scenarios so far been difficult to automatically absence corresponding input grammars: $\bullet$ Bug finding. Given a code change (e.g., commit pull request), our...
Changes often introduce program errors, and hence recent software testing literature has focused on generating tests which stress changes. In this paper, we argue that changes cannot be treated as isolated artifacts are stressed via testing. Instead, it is the complex dependency across multiple subtle errors. Furthermore, dependence structures, need to exercised expose such ensure they remain undiscovered even in well tested deployed software. We motivate our work based empirical evidence...
Detecting regression bugs in software evolution, analyzing side-channels programs and evaluating robustness deep neural networks (DNNs) can all be seen as instances of differential analysis, where the goal is to generate diverging executions program paths. Two are said if observable behavior differs, e.g., terms output, execution time, or (DNN) classification. The key challenge analysis simultaneously reason about multiple paths, often across variants.
Many protocol implementations are reactive systems, where the process is in continuous interaction with other processes and environment. If a bug can be exposed only certain state, fuzzer needs to provide specific sequence of events as inputs that would take into this state before manifested. We call these bugs "stateful" bugs. Usually, when we testing implementation, do not have detailed formal specification rely upon. Without knowledge protocol, it inherently difficult for discover such...
Regression verification (RV) seeks to guarantee the absence of regression errors in a changed program version. This paper presents Partition-based Verification (PRV): an approach RV based on gradual exploration differential input partitions. A partition is subset common space two versions that serves as unit verification. Instead proving for complete at once, PRV verifies partitions manner. If interrupted, retains partial guarantees least explored crucial practice verifying can be...
Regression verification (RV) seeks to guarantee the absence of regression errors in a changed program version. This paper presents Partition-based Verification (PRV): an approach RV based on gradual exploration differential input partitions. A partition is subset common space two versions that serves as unit verification. Instead proving for complete at once, PRV verifies partitions manner. If interrupted, retains partial guarantees least explored crucial practice verifying can be...