- Software Engineering Research
- Software Testing and Debugging Techniques
- Software Reliability and Analysis Research
- Advanced Malware Detection Techniques
- Software System Performance and Reliability
- Security and Verification in Computing
- Geological and Geochemical Analysis
- Logic, programming, and type systems
- High-pressure geophysics and materials
- Formal Methods in Verification
- earthquake and tectonic studies
- Web Data Mining and Analysis
- Network Security and Intrusion Detection
- Geochemistry and Geochronology of Asian Mineral Deposits
- Geomechanics and Mining Engineering
- Web Application Security Vulnerabilities
- Topic Modeling
- Geological Modeling and Analysis
- Iterative Methods for Nonlinear Equations
- Geological and Geophysical Studies
- Advanced Data Storage Technologies
- Cloud Computing and Resource Management
- Paleontology and Stratigraphy of Fossils
- Adversarial Robustness in Machine Learning
- Engineering Education and Curriculum Development
Xiamen University
2019-2024
Hong Kong University of Science and Technology
2014-2019
University of Hong Kong
2014-2019
Ningbo Polytechnic
2015
Tsinghua University
2010-2012
China University of Mining and Technology
2012
University of Science and Technology of China
2006-2012
Anhui University of Science and Technology
2008-2010
Northwest University
2006
Software defect information, including links between bugs and committed changes, plays an important role in software maintenance such as measuring quality predicting defects. Usually, the are automatically mined from change logs bug reports using heuristics searching for specific keywords IDs logs. However, accuracy of these depends on Bird et al. found that there many missing due to absence references They also lead biased it affects prediction performance. We manually inspected explicit...
Many software defect prediction models have been built using historical data obtained by mining repositories (MSR). Recent studies discovered that so collected contain noises because current collection practices are based on optional bug fix keywords or report links in change logs. Automatically the logs could include noises.
The effectiveness of search-based automated program repair is limited in the number correct patches that can be successfully generated. There are two causes such limitation. First, search space does not contain patch. Second, huge and therefore patch cannot generated (i.e., either after incorrect plausible ones or within time budget).
Various information retrieval (IR) based techniques have been proposed recently to locate bugs automatically at the file level. However, their usefulness is often compromised by coarse granularity of files and lack contextual information. To address this, we propose using software changes, which offer finer than provide important clues for bug-fixing. We observe that bug inducing changes can facilitate fixing process. For example, it helps triage task developers who committed or enables fix...
Software crash is common. When a occurs, software developers can receive report upon user permission. A typically includes call stack at the time of crash. An important step debugging to identify faulty functions, which often tedious and labor-intensive task. In this paper, we propose CrashLocator, method locate functions using information in reports. It deduces possible traces (the failing execution that lead crash) by expanding with static graph. then calculates suspiciousness each...
When dealing with millions of lines code, we still cannot have the cake and eat it: sparse value-flow analysis is powerful in checking source-sink problems, but existing work escape from “pointer trap” – a precise points-to limits its scalability an imprecise one seriously undermines precision. We present Pinpoint, holistic approach that decomposes cost high-precision by precisely discovering local data dependence delaying expensive inter-procedural through memorization. Such memorization...
Unlike coverage-based fuzzing that gives equal attention to every part of a code, directed aims direct fuzzer specific target in the e.g., code with potential vulnerabilities. Despite much progress, we observe existing fuzzers are still not efficient as they often symbolically or concretely execute lot program paths cannot reach code. They thus waste computational resources. This paper presents BEACON, which can effectively grey-box sea provable manner. That is, assisted by lightweight...
Memory-related vulnerabilities constitute severe threats to the security of modern software. Despite success deep learning-based approaches generic vulnerability detection, they are still limited by underutilization flow information when applied for detecting memory-related vulnerabilities, leading high false positives. In this paper,we propose MVD, a statement-level Vulnerability Detection approach based on flow-sensitive graph neural networks (FS-GNN). FS-GNN is employed jointly embed both...
Software often crashes. Once a crash happens, report could be sent to software developers for investigation upon user permission. To facilitate efficient handling of crashes, reports received by Microsoft's Windows Error Reporting (WER) system are organized into set buckets. Each bucket contains duplicate that deemed as manifestations the same bug. The information is important prioritizing efforts resolve crashing bugs. improve accuracy bucketing, we propose ReBucket, method clustering based...
Software often crashes. Once a crash happens, report could be sent to software developers for investigation upon user permission. To facilitate efficient handling of crashes, reports received by Microsoft's Windows Error Reporting (WER) system are organized into set "buckets". Each bucket contains duplicate that deemed as manifestations the same bug. The information is important prioritizing efforts resolve crashing bugs. improve accuracy bucketing, we propose ReBucket, method clustering...
Intensive dependencies of a Java project on third-party libraries can easily lead to the presence multiple library or class versions its classpath. When this happens, JVM will load one version and shadows others. Dependency conflict (DC) issues occur when loaded fails cover required feature (e.g., method) referenced by project, thus causing runtime exceptions. However, warnings duplicate classes detected existing build tools such as Maven be benign since not all instances duplication induce...
Hybrid fuzzing, which combines the merits of both fuzzing and concolic execution, has become one most important trends in coverage-guided techniques. Despite tremendous research on hybrid fuzzers, we observe that existing techniques are still inefficient. One reason is these techniques, refer to as non-incremental cache reuse few computation results and, thus, lose many optimization opportunities. To be incremental, propose "polyhedral path abstraction", preserves exploration state execution...
Spectrum-based fault localization (SBFL) techniques are widely studied and have been evaluated to be effective in locating faults. Recent studies also showed that developers from industry value automated SBFL techniques. However, their effectiveness is still limited by two main reasons. First, the test coverage information leveraged construct spectrum does not reflect root cause directly. Second, suffers tie issue so buggy code entities can well differentiated non-buggy ones. To address...
Detecting memory leak at industrial scale is still not well addressed, in spite of the tremendous effort from both industry and academia past decades. Existing work suffers an unresolved paradox - a highly precise analysis limits its scalability imprecise one seriously hurts precision or recall. In this work, we present SMOKE, staged approach to resolve paradox. ?rst stage, instead using uniform for all paths, use scalable but compute succinct set candidate paths. second leverage more verify...
Automatic code summarization frees software developers from the heavy burden of manual commenting and benefits development maintenance. Abstract Syntax Tree (AST), which depicts source code's syntactic structure, has been incorporated to guide generation summaries. However, existing AST based methods suffer difficulty training generate inadequate In this paper, we present Block-wise Splitting method (BASTS for short), fully utilizes rich tree-form syntax structure in ASTs, improving...
Java (de)serialization is prone to causing security-critical vulnerabilities that attackers can invoke existing methods (gadgets) on the application's classpath construct a gadget chain perform malicious behaviors. Several techniques have been proposed statically identify suspicious chains and dynamically generate injection objects for fuzzing. However, due their incomplete support dynamic program features (e.g., runtime polymorphism) ineffective object generation fuzzing, are still far from...
Bug-inducing commits provide important information to understand when and how bugs were introduced. Therefore, they have been extensively investigated by existing studies frequently leveraged facilitate bug fixings in industrial practices.
Software defect prediction, which aims to identify defective modules, can assist developers in finding bugs and prioritizing limited quality assurance resources. Various features build prediction models have been proposed evaluated. Among them, process metrics are one important category. Yet, existing mainly encoded manually from change histories ignore the sequential information arising changes during software evolution. Are sequences derived such useful characterize buggy program modules?...
Intensive use of libraries in Java projects brings potential risk dependency conflicts, which occur when a project directly or indirectly depends on multiple versions the same library class. When this happens, JVM loads one version and shadows others. Runtime exceptions can methods shadowed are referenced. Although management tools such as Maven able to give warnings conflicts is built, developers often ask for crashing stack traces before examining these warnings. It motivates us develop...
Misuses of library APIs are pervasive and often lead to software crashes vulnerability issues. Various static analysis tools have been proposed detect API misuses. They involve mining frequent patterns from a large number correct usage examples, which can be hard obtain in practice. also suffer low precision due an over-simplified assumption that deviation indicates misuse. We make two observations on the discovery misuse patterns. First, misuses represented as mutants corresponding usages....