NFDI4DS | UHH-SEMS - Publication Details

Directed Greybox Fuzzing

OPENALEX - Publications

Marcel Böhme Van-Thuan Pham Manh-Dung Nguyen Abhik Roychoudhury

Existing Greybox Fuzzers (GF) cannot be effectively directed, for instance, towards problematic changes or patches, critical system calls dangerous locations, functions in the stack-trace of a reported vulnerability that we wish to reproduce. In this paper, introduce Directed Fuzzing (DGF) which generates inputs with objective reaching given set target program locations efficiently. We develop and evaluate simulated annealing-based power schedule gradually assigns more energy seeds are...

10.1145/3133956.3134020 article EN Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security 2017-10-27

Coverage-based Greybox Fuzzing as Markov Chain

OPENALEX - Publications

Marcel Böhme Van-Thuan Pham Abhik Roychoudhury

Coverage-based Greybox Fuzzing (CGF) is a random testing approach that requires no program analysis. A new test generated by slightly mutating seed input. If the exercises and interesting path, it added to set of seeds; otherwise, discarded. We observe most tests exercise same few "high-frequency" paths develop strategies explore significantly more with number gravitating towards low-frequency paths. explain challenges opportunities CGF using Markov chain model which specifies probability...

10.1145/2976749.2978428 article EN Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security 2016-10-24

Coverage-Based Greybox Fuzzing as Markov Chain

OPENALEX - Publications

Marcel Böhme Van-Thuan Pham Abhik Roychoudhury

Coverage-based Greybox Fuzzing (CGF) is a random testing approach that requires no program analysis. A new test generated by slightly mutating seed input. If the exercises and interesting path, it added to set of seeds; otherwise, discarded. We observe most tests exercise same few "high-frequency" paths develop strategies explore significantly more with number gravitating towards low-frequency paths. explain challenges opportunities CGF using Markov chain model which specifies probability...

10.1109/tse.2017.2785841 article EN IEEE Transactions on Software Engineering 2017-12-21

AFLNET: A Greybox Fuzzer for Network Protocols

OPENALEX - Publications

Van-Thuan Pham Marcel Böhme Abhik Roychoudhury

Server fuzzing is difficult. Unlike simple command-line tools, servers feature a massive state space that can be traversed effectively only with well-defined sequences of input messages. Valid are specified in protocol. In this paper, we present AFLNET, the first greybox fuzzer for protocol implementations. existing fuzzers, AFLNET takes mutational approach and uses state-feedback to guide process. seeded corpus recorded message exchanges between server an actual client. No specification or...

10.1109/icst46399.2020.00062 article EN 2020-08-05

Model-based whitebox fuzzing for program binaries

OPENALEX - Publications

Van-Thuan Pham Marcel Böhme Abhik Roychoudhury

Many real-world programs take highly structured and very complex inputs. The automated testing of such is non-trivial. If the test input does not adhere to a specific file format, program returns parser error. For symbolic execution-based whitebox fuzzing corresponding error handling code becomes significant time sink. Too much spent in exploring too many paths leading trivial errors. Naturally, better functional part where failure with valid exposes deep real bugs program. In this paper, we...

10.1145/2970276.2970316 article EN 2016-08-25

Smart Greybox Fuzzing

OPENALEX - Publications

Van-Thuan Pham Marcel Boehme Andrew E. Santosa Alexandru Răzvan Căciulescu Abhik Roychoudhury

Coverage-based greybox fuzzing (CGF) is one of the most successful approaches for automated vulnerability detection. Given a seed file (as sequence bits), CGF randomly flips, deletes or copies some bits to generate new files. iteratively constructs (and fuzzes) corpus by retaining those generated files which enhance coverage. However, random bitflips are unlikely produce valid (or chunks in files), applications processing complex formats. In this work, we introduce smart (SGF) leverages...

10.1109/tse.2019.2941681 article EN IEEE Transactions on Software Engineering 2019-09-17

ProFuzzBench: a benchmark for stateful protocol fuzzing

OPENALEX - Publications

Roberto Natella Van-Thuan Pham

We present a new benchmark (ProFuzzBench) for stateful fuzzing of network protocols. The includes suite representative open-source servers popular protocols, and tools to automate experimentation. discuss challenges potential directions future research based on this benchmark.

10.1145/3460319.3469077 article EN 2021-07-08

AFLNet Five Years Later: On Coverage-Guided Protocol Fuzzing

OPENALEX - Publications

Ruijie Meng Van-Thuan Pham Marcel Böhme Abhik Roychoudhury

10.1109/tse.2025.3535925 article EN cc-by IEEE Transactions on Software Engineering 2025-01-01

EDEFuzz: A Web API Fuzzer for Excessive Data Exposures

OPENALEX - Publications

Lianglu Pan Shaanan Cohney Toby Murray Van-Thuan Pham

APIs often transmit far more data to client applications than they need, and in the context of web applications, do so over public channels. This issue, termed Excessive Data Exposure (EDE), was OWASP's third most significant API vulnerability 2019. However, there are few automated tools---either research or industry---to effectively find remediate such issues. is unsurprising as problem lacks an explicit test oracle: does not manifest through abnormal behaviours (e.g., program crashes...

10.1145/3597503.3608133 article EN 2024-02-06

Human-In-The-Loop Automatic Program Repair

OPENALEX - Publications

Marcel Böhme Charaka Geethal Van-Thuan Pham

We introduce LEARN2FIX, the first human-in-the-loop, semi-automatic repair technique when no bug oracle-except for user who is reporting bug-is available. Our approach negotiates with condition under which observed. Only a budget of queries to exhausted, it attempts bug. A query can be thought as following question: "When executing this alternative test input, program produces output; observed"? Through systematic queries, LEARN2FIX trains an automatic oracle that becomes increasingly more...

10.1109/icst46399.2020.00036 article EN 2020-08-05

An Empirical Study of Static Analysis Tools for Secure Code Review

OPENALEX - Publications

Wachiraphan Charoenwet Patanamon Thongtanunam Van-Thuan Pham Christoph Treude

10.1145/3650212.3680313 article EN 2024-09-11

Hercules: reproducing crashes in real-world application binaries

OPENALEX - Publications

Van-Thuan Pham Wei Boon Ng Konstantin Rubinov Abhik Roychoudhury

Binary analysis is a well-investigated area in software engineering and security. Given real-world program binaries, generating test inputs which cause the binaries to crash crucial. Generation of crashing has many applications including off-line prior deployment, or online patches as they are inserted. In this work, we present method for reach given potentially location. Such locations can be found by separate static (or gleaning reports submitted internal / external users) serve input our...

10.5555/2818754.2818862 article EN International Conference on Software Engineering 2015-05-16

State Selection Algorithms and Their Impact on The Performance of Stateful Network Protocol Fuzzing

OPENALEX - Publications

Dongge Liu Van-Thuan Pham Gidon Ernst Toby Murray Benjamin I. P. Rubinstein

The statefulness property of network protocol implementations poses a unique challenge for testing and verification techniques, including Fuzzing. Stateful fuzzers tackle this by leveraging state models to partition the space assist test generation process. Since not all states are equally important fuzzing campaigns have time limits, need effective selection algorithms prioritize progressive over others. Several been proposed but they were implemented evaluated separately on different...

10.1109/saner53432.2022.00089 article EN 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2022-03-01

Hercules: Reproducing Crashes in Real-World Application Binaries

OPENALEX - Publications

Van-Thuan Pham Wei Boon Ng Konstantin Rubinov Abhik Roychoudhury

Binary analysis is a well-investigated area in software engineering and security. Given real-world program binaries, generating test inputs which cause the binaries to crash crucial. Generation of crashing has many applications including off-line prior deployment, or online patches as they are inserted. In this work, we present method for reach given "potentially crashing" location. Such potentially locations can be found by separate static (or gleaning reports submitted internal / external...

10.1109/icse.2015.99 article EN 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering 2015-05-01

Human-in-the-Loop Automatic Program Repair

OPENALEX - Publications

Charaka Geethal Marcel Böhme Van-Thuan Pham

<sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">LEARN</small> 2 xmlns:xlink="http://www.w3.org/1999/xlink">FIX</small> is a <italic xmlns:xlink="http://www.w3.org/1999/xlink">human-in-the-loop interactive program repair</i> technique, which can be applied when no bug oracle—except the user who reporting bug—is available. This approach incrementally learns condition under observed by systematic negotiation with user. In this process, generates...

10.1109/tse.2023.3305052 article EN cc-by-nc-nd IEEE Transactions on Software Engineering 2023-08-21

Towards Systematic and Dynamic Task Allocation for Collaborative Parallel Fuzzing

OPENALEX - Publications

Van-Thuan Pham Manh-Dung Nguyen Quang-Trung Ta Toby Murray Benjamin I. P. Rubinstein

Parallel coverage-guided greybox fuzzing is the most common setup for vulnerability discovery at scale. However, so far it has received little attention from research community compared to single-mode fuzzing, leaving open several problems particularly in its task allocation strategies. Current approaches focus on managing micro tasks, seed input level, and their division algorithms are either ad-hoc or static. In this paper, we leverage graph partitioning search propose a systematic dynamic...

10.1109/ase51524.2021.9678810 article EN 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2021-11-01

Toward effective secure code reviews: an empirical study of security-related coding weaknesses

OPENALEX - Publications

Wachiraphan Charoenwet Patanamon Thongtanunam Van-Thuan Pham Christoph Treude

Abstract Identifying security issues early is encouraged to reduce the latent negative impacts on software systems. Code review a widely-used method that allows developers manually inspect modified code, catching during development cycle. However, existing code studies often focus known vulnerabilities, neglecting coding weaknesses, which can introduce real-world are more visible through review. The practices of reviews in identifying such weaknesses not yet fully investigated. To better...

10.1007/s10664-024-10496-y article EN cc-by Empirical Software Engineering 2024-06-08

Human-in-the-loop oracle learning for semantic bugs in string processing programs

OPENALEX - Publications

Charaka Geethal Kapugama Van-Thuan Pham Aldeida Aleti Marcel Böhme

How can we automatically repair semantic bugs in string-processing programs? A bug is an unexpected program state: The does not crash (which be easily detected). Instead, the processes input incorrectly. It produces output which users identify as unexpected. We envision a fully automated debugging process for where user reports behavior given and machine negotiates condition under fails. During negotiation, learns to predict user's response this oracle bugs.

10.1145/3533767.3534406 article EN 2022-07-15

Integrated Timing Analysis of Application and Operating Systems Code

OPENALEX - Publications

Lee Kee Chong Clément Ballabriga Van-Thuan Pham Sudipta Chattopadhyay Abhik Roychoudhury

Real-time embedded software often runs on a supervisory operating system layer top of modern processor. Thus, to give timing guarantees the execution time and response such applications, one needs consider effects system, as calls interrupts - over above modeling micro-architectural features pipeline cache. Previous works Worst-case Execution Time (WCET) analysis have focused while ignoring system's effects. As result, WCET analyzers only estimate maximum un-interrupted program. In this...

10.1109/rtss.2013.21 preprint EN 2013-12-01

An Empirical Study of Static Analysis Tools for Secure Code Review

OPENALEX - Publications

Wachiraphan Charoenwet Patanamon Thongtanunam Van-Thuan Pham Christoph Treude

Early identification of security issues in software development is vital to minimize their unanticipated impacts. Code review a widely used manual analysis method that aims uncover along with other coding projects. While some studies suggest automated static application testing tools (SASTs) could enhance issue identification, there limited understanding SAST's practical effectiveness supporting secure code review. Moreover, most SAST rely on synthetic or fully vulnerable versions the...

10.48550/arxiv.2407.12241 preprint EN arXiv (Cornell University) 2024-07-16

AFLNet Five Years Later: On Coverage-Guided Protocol Fuzzing

OPENALEX - Publications

Ruijie Meng Van-Thuan Pham Marcel Böhme Abhik Roychoudhury

Protocol implementations are stateful which makes them difficult to test: Sending the same test input message twice might yield a different response every time. Our proposal consider sequence of messages as seed for coverage-directed greybox fuzzing, associate each with corresponding protocol state, and maximize coverage both state space code was first published in 2020 short tool demonstration paper. AFLNet code- state-coverage-guided fuzzer; it used an indicator current state. Over past...

10.48550/arxiv.2412.20324 preprint EN arXiv (Cornell University) 2024-12-28

Detecting Excessive Data Exposures in Web Server Responses with Metamorphic Fuzzing

OPENALEX - Publications

Lianglu Pan Shaanan Cohney Toby Murray Van-Thuan Pham

APIs often transmit far more data to client applications than they need, and in the context of web applications, do so over public channels. This issue, termed Excessive Data Exposure (EDE), was OWASP's third most significant API vulnerability 2019. However, there are few automated tools -- either research or industry effectively find remediate such issues. is unsurprising as problem lacks an explicit test oracle: does not manifest through abnormal behaviours (e.g., program crashes memory...

10.48550/arxiv.2301.09258 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01