- Software Engineering Research
- Software Reliability and Analysis Research
- Software Testing and Debugging Techniques
- Topic Modeling
- Advanced Malware Detection Techniques
- Software System Performance and Reliability
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Semantic Web and Ontologies
- Advanced Graph Neural Networks
- Music and Audio Processing
- Software Engineering Techniques and Practices
- Security and Verification in Computing
- Data Quality and Management
- Information and Cyber Security
- Privacy-Preserving Technologies in Data
- Advanced Text Analysis Techniques
- Genomics and Phylogenetic Studies
- Human Motion and Animation
- Adversarial Robustness in Machine Learning
- Educational Technology and Assessment
- Speech and Audio Processing
- Higher Education and Teaching Methods
- EFL/ESL Teaching and Learning
- Advanced Software Engineering Methodologies
New Jersey Institute of Technology
2019-2024
Xi'an Technological University
2022-2023
Beijing Language and Culture University
2019-2021
Tsinghua University
2021
Google (United States)
2021
University of Chinese Academy of Sciences
2019
Center for Excellence in Brain Science and Intelligence Technology
2019
Central South University
2018
Yunnan Normal University
2011
Yunnan University
2011
We collected a large C/C++ code vulnerability dataset from open-source Github projects, namely Big-Vul. crawled the public Common Vulnerabilities and Exposures (CVE) database CVE-related source repositories. Specifically, we descriptive information of vulnerabilities CVE database, e.g., IDs, severity scores, summaries. With its related published repository links, downloaded all repositories extracted changes. In total, Big-Vul contains 3,754 spanning 91 different types. All these are 348...
Automated Program Repair (APR) is very useful in helping developers the process of software development and maintenance. Despite recent advances deep learning (DL), DL-based APR approaches still have limitations bug-fixing code changes context surrounding source changes. These lead to incorrect fixing locations or fixes. In this paper, we introduce DLFix, a two-tier DL model that treats as transformation from prior bug fixes contexts The first layer tree-based RNN learns its result used an...
Bug detection has been shown to be an effective way help developers in detecting bugs early, thus, saving much effort and time software development process. Recently, deep learning-based bug approaches have gained successes over the traditional machine approaches, rule-based program analysis mining-based approaches. However, they are still limited that involve multiple methods suffer high rate of false positives. In this paper, we propose a combination approach with use contexts attention...
In this paper, we propose DeepRL4FL, a deep learning fault localization (FL) approach that locates the buggy code at statement and method levels by treating FL as an image pattern recognition problem. DeepRL4FL does so via novel coverage representation (RL) data dependencies RL for program statements. Those two types of on dynamic information in matrix are also combined with static usual suspicious source code. This combination is inspired crime scene investigation which investigators...
The existing deep learning (DL)-based automated program repair (APR) models are limited in fixing general software defects. We present DEAR, a DL-based approach that supports for the bugs require dependent changes at once to one or multiple consecutive statements hunks of code. first design novel fault localization (FL) technique multi-hunk, multi-statement fixes combines traditional spectrum-based (SB) FL with and data-flow analysis. It takes buggy returned by SBFL model, detects be fixed...
This paper proposes a method for representation learning of multimodal data using contrastive losses. A traditional approach is to contrast different modalities learn the information shared among them. However, that could fail complementary synergies between might be useful downstream tasks. Another concatenate all into tuple and then positive negative correspondences. consider only stronger while ignoring weaker ones. To address these issues, we propose novel objective, TupleInfoNCE. It...
Misleading method names in software projects can confuse developers, which may lead to defects and affect code understandability. In this paper, we present DeepName, a context-based, deep learning approach detect name inconsistencies suggest proper for method. The key departure point is the philosophy of "Show Me Your Friends, I'll Tell You Who Are". Unlike state-of-the-art approaches, addition method's body, also consider interactions current under study with other ones including caller...
This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation. DreamLLM operates on two fundamental principles. The focuses the generative modeling of both language image posteriors by direct sampling in raw space. approach circumvents limitations information loss inherent to external feature extractors like CLIP, more thorough understanding...
The advances of machine learning (ML) including deep (DL) have enabled several approaches to implicitly learn vulnerable code patterns automatically detect software vulnerabilities. A recent study showed that despite successes, the existing ML/DL-based vulnerability detection (VD) models are limited in ability distinguish between two classes and benign code. We propose DeepVD, a graph-based neural network VD model emphasizes on class-separation features DeepVDleverages three types at...
Automatic code completion helps improve developers' productivity in their programming tasks. A program contains instructions expressed via statements, which are considered as the basic units of execution. In this paper, we introduce AutoSC, combines analysis and principle software naturalness to fill a partially completed statement. AutoSC benefits from strengths both directions, statement is frequent valid. first trained on large corpus derive templates candidate statements. Then, it uses...
Recent progress in Deep Learning (DL) has sparked interest using DL to detect software vulnerabilities automatically and it been demonstrated promising results at detecting vulnerabilities. However, one prominent practical issue for vulnerability detection is data imbalance. Prior study observed that the performance of state-of-the-art (SOTA) DL-based (DLVD) approaches drops precipitously real world imbalanced a 73% drop F1-score on average across studied approaches. Such significant can...
During software evolution, developers make several changes and commit them into the repositories. Unfortunately, many of tangle different purposes, both hampering program comprehension reducing separation concerns. Automated approaches with deterministic solutions have been proposed to untangle commits. However, specifying an effective clustering criteria on in a for untangling is challenging those approaches. In this work, we present UTango, machine learning (ML)-based approach that learns...
Software Vulnerabilities (SVs) are security flaws that exploitable in cyber-attacks. Delay the detection and assessment of SVs might cause serious consequences due to unknown impacts on attacked systems. The state-of-the-art approaches have been proposed work directly committed code changes for early detection. However, none them could provide both commit-level vulnerability at once. Moreover, still suffer low accuracy limited representations surrounding contexts.
This paper investigates the problem of ranking linked data from relational databases using a framework. The core idea is to group relationships by their types, then rank and finally instances attached each type. criteria for step considers mapping rules heterogeneous graph structure web. Tests based on social network dataset show that effective easier people understand. approach benefits utilizing deduced table schemas distinguishing relationship which results in better andvisualization data.
Traditional program slicing techniques are crucial for early bug detection and manual/automated debugging of online code snippets. Nevertheless, their inability to handle incomplete hinders real-world applicability in such scenarios. To overcome these challenges, we present NS-Slicer, a novel learning-based approach that predicts static slices both complete partial Our tool leverages pre-trained language model exploit its understanding fine-grained variable-statement dependencies within...
Practical code reuse often leads to the incorporation of fragments from developer forums into applications. However, these fragments, being incomplete, frequently lack details on exception handling. Integrating handling a codebase is not straightforward task, requiring developers understand and remember which API methods may trigger exceptions should be handled. To address that, we introduce EHBlock, learning-based recommender for Java snippets. EHBlock analyzes given snippet suggests...
API misuses refer to incorrect usages that violate the usage constraints of elements, potentially leading issues such as runtime errors, exceptions, program crashes, and security vulnerabilities. Existing mining-based approaches for misuse detection face challenges in accuracy, particularly distinguishing infrequent from invalid usage. This limitation stems necessity set predefined thresholds frequent patterns, resulting potential misclassification alternative usages. paper introduces...
Program slicing, the process of extracting program statements that influence values at a designated location (known as slicing criterion), is helpful in both manual and automated debugging. However, such techniques prove ineffective scenarios where executing specific inputs prohibitively expensive, or even impossible, with partial code. In this paper, we introduce ND-Slicer, predictive methodology caters to executions based on particular input, overcoming need for actual execution. We enable...
To avoid the exposure of original source code, variable names deployed in wild are often replaced by short, meaningless names, thus making code difficult to understand and be analyzed. We introduce DeMinify, a Deep-Learning (DL)-based approach that formulates such recovery problem as prediction missing features Graph Convolutional Network–Missing Features. The graph represents both relations among variables their types, which or types some nodes missing. Moreover, DeMinify leverages...
Stakeholders play critical roles in requirements elicitation, since they are the source of requirements, and quality elicited is significantly influenced by degree stakeholders' participation collaboration elicitation. However, elicitation often obstructed due to diversity background interests, especially different perspectives on envisioned systems, insufficient communication common-understanding among them, abilities express requirements.
Developer forums are one of the most popular and useful Q&A websites on API usages. The analysis can be a critical step towards automated question answer approaches. In this poster, we empirically study three forums: Twitter, eBay, AdWords, to investigate characteristics question-answering process. We observe that +60% posts all were answered with method names or documentation. +85% questions by development teams answers from drew fewer follow-up questions. Our results provide empirical...