Yi Li

ORCID: 0009-0007-0143-0677
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Software Engineering Research
  • Software Reliability and Analysis Research
  • Software Testing and Debugging Techniques
  • Topic Modeling
  • Advanced Malware Detection Techniques
  • Software System Performance and Reliability
  • Natural Language Processing Techniques
  • Multimodal Machine Learning Applications
  • Semantic Web and Ontologies
  • Advanced Graph Neural Networks
  • Music and Audio Processing
  • Software Engineering Techniques and Practices
  • Security and Verification in Computing
  • Data Quality and Management
  • Information and Cyber Security
  • Privacy-Preserving Technologies in Data
  • Advanced Text Analysis Techniques
  • Genomics and Phylogenetic Studies
  • Human Motion and Animation
  • Adversarial Robustness in Machine Learning
  • Educational Technology and Assessment
  • Speech and Audio Processing
  • Higher Education and Teaching Methods
  • EFL/ESL Teaching and Learning
  • Advanced Software Engineering Methodologies

New Jersey Institute of Technology
2019-2024

Xi'an Technological University
2022-2023

Beijing Language and Culture University
2019-2021

Tsinghua University
2021

Google (United States)
2021

University of Chinese Academy of Sciences
2019

Center for Excellence in Brain Science and Intelligence Technology
2019

Central South University
2018

Yunnan Normal University
2011

Yunnan University
2011

We collected a large C/C++ code vulnerability dataset from open-source Github projects, namely Big-Vul. crawled the public Common Vulnerabilities and Exposures (CVE) database CVE-related source repositories. Specifically, we descriptive information of vulnerabilities CVE database, e.g., IDs, severity scores, summaries. With its related published repository links, downloaded all repositories extracted changes. In total, Big-Vul contains 3,754 spanning 91 different types. All these are 348...

10.1145/3379597.3387501 article EN 2020-06-29

Automated Program Repair (APR) is very useful in helping developers the process of software development and maintenance. Despite recent advances deep learning (DL), DL-based APR approaches still have limitations bug-fixing code changes context surrounding source changes. These lead to incorrect fixing locations or fixes. In this paper, we introduce DLFix, a two-tier DL model that treats as transformation from prior bug fixes contexts The first layer tree-based RNN learns its result used an...

10.1145/3377811.3380345 article EN 2020-06-27

Bug detection has been shown to be an effective way help developers in detecting bugs early, thus, saving much effort and time software development process. Recently, deep learning-based bug approaches have gained successes over the traditional machine approaches, rule-based program analysis mining-based approaches. However, they are still limited that involve multiple methods suffer high rate of false positives. In this paper, we propose a combination approach with use contexts attention...

10.1145/3360588 article EN Proceedings of the ACM on Programming Languages 2019-10-10

In this paper, we propose DeepRL4FL, a deep learning fault localization (FL) approach that locates the buggy code at statement and method levels by treating FL as an image pattern recognition problem. DeepRL4FL does so via novel coverage representation (RL) data dependencies RL for program statements. Those two types of on dynamic information in matrix are also combined with static usual suspicious source code. This combination is inspired crime scene investigation which investigators...

10.1109/icse43902.2021.00067 article EN 2021-05-01

The existing deep learning (DL)-based automated program repair (APR) models are limited in fixing general software defects. We present DEAR, a DL-based approach that supports for the bugs require dependent changes at once to one or multiple consecutive statements hunks of code. first design novel fault localization (FL) technique multi-hunk, multi-statement fixes combines traditional spectrum-based (SB) FL with and data-flow analysis. It takes buggy returned by SBFL model, detects be fixed...

10.1145/3510003.3510177 article EN Proceedings of the 44th International Conference on Software Engineering 2022-05-21

This paper proposes a method for representation learning of multimodal data using contrastive losses. A traditional approach is to contrast different modalities learn the information shared among them. However, that could fail complementary synergies between might be useful downstream tasks. Another concatenate all into tuple and then positive negative correspondences. consider only stronger while ignoring weaker ones. To address these issues, we propose novel objective, TupleInfoNCE. It...

10.1109/iccv48922.2021.00079 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Misleading method names in software projects can confuse developers, which may lead to defects and affect code understandability. In this paper, we present DeepName, a context-based, deep learning approach detect name inconsistencies suggest proper for method. The key departure point is the philosophy of "Show Me Your Friends, I'll Tell You Who Are". Unlike state-of-the-art approaches, addition method's body, also consider interactions current under study with other ones including caller...

10.1109/icse43902.2021.00060 article EN 2021-05-01

This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation. DreamLLM operates on two fundamental principles. The focuses the generative modeling of both language image posteriors by direct sampling in raw space. approach circumvents limitations information loss inherent to external feature extractors like CLIP, more thorough understanding...

10.48550/arxiv.2309.11499 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01

The advances of machine learning (ML) including deep (DL) have enabled several approaches to implicitly learn vulnerable code patterns automatically detect software vulnerabilities. A recent study showed that despite successes, the existing ML/DL-based vulnerability detection (VD) models are limited in ability distinguish between two classes and benign code. We propose DeepVD, a graph-based neural network VD model emphasizes on class-separation features DeepVDleverages three types at...

10.1109/icse48619.2023.00189 article EN 2023-05-01

Automatic code completion helps improve developers' productivity in their programming tasks. A program contains instructions expressed via statements, which are considered as the basic units of execution. In this paper, we introduce AutoSC, combines analysis and principle software naturalness to fill a partially completed statement. AutoSC benefits from strengths both directions, statement is frequent valid. first trained on large corpus derive templates candidate statements. Then, it uses...

10.1109/ase.2019.00072 article EN 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2019-11-01

Recent progress in Deep Learning (DL) has sparked interest using DL to detect software vulnerabilities automatically and it been demonstrated promising results at detecting vulnerabilities. However, one prominent practical issue for vulnerability detection is data imbalance. Prior study observed that the performance of state-of-the-art (SOTA) DL-based (DLVD) approaches drops precipitously real world imbalanced a 73% drop F1-score on average across studied approaches. Such significant can...

10.1109/icse48619.2023.00192 article EN 2023-05-01

During software evolution, developers make several changes and commit them into the repositories. Unfortunately, many of tangle different purposes, both hampering program comprehension reducing separation concerns. Automated approaches with deterministic solutions have been proposed to untangle commits. However, specifying an effective clustering criteria on in a for untangling is challenging those approaches. In this work, we present UTango, machine learning (ML)-based approach that learns...

10.1145/3540250.3549171 article EN Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering 2022-11-07

Software Vulnerabilities (SVs) are security flaws that exploitable in cyber-attacks. Delay the detection and assessment of SVs might cause serious consequences due to unknown impacts on attacked systems. The state-of-the-art approaches have been proposed work directly committed code changes for early detection. However, none them could provide both commit-level vulnerability at once. Moreover, still suffer low accuracy limited representations surrounding contexts.

10.1145/3611643.3616346 article EN 2023-11-30

This paper investigates the problem of ranking linked data from relational databases using a framework. The core idea is to group relationships by their types, then rank and finally instances attached each type. criteria for step considers mapping rules heterogeneous graph structure web. Tests based on social network dataset show that effective easier people understand. approach benefits utilizing deduced table schemas distinguishing relationship which results in better andvisualization data.

10.1016/s1007-0214(10)70111-5 article EN Tsinghua Science & Technology 2010-12-01

Traditional program slicing techniques are crucial for early bug detection and manual/automated debugging of online code snippets. Nevertheless, their inability to handle incomplete hinders real-world applicability in such scenarios. To overcome these challenges, we present NS-Slicer, a novel learning-based approach that predicts static slices both complete partial Our tool leverages pre-trained language model exploit its understanding fine-grained variable-statement dependencies within...

10.1145/3649814 article EN Proceedings of the ACM on Programming Languages 2024-04-29

Practical code reuse often leads to the incorporation of fragments from developer forums into applications. However, these fragments, being incomplete, frequently lack details on exception handling. Integrating handling a codebase is not straightforward task, requiring developers understand and remember which API methods may trigger exceptions should be handled. To address that, we introduce EHBlock, learning-based recommender for Java snippets. EHBlock analyzes given snippet suggests...

10.1145/3639478.3643082 article EN cc-by 2024-04-14

API misuses refer to incorrect usages that violate the usage constraints of elements, potentially leading issues such as runtime errors, exceptions, program crashes, and security vulnerabilities. Existing mining-based approaches for misuse detection face challenges in accuracy, particularly distinguishing infrequent from invalid usage. This limitation stems necessity set predefined thresholds frequent patterns, resulting potential misclassification alternative usages. paper introduces...

10.1145/3639478.3643080 article EN cc-by 2024-04-14

Program slicing, the process of extracting program statements that influence values at a designated location (known as slicing criterion), is helpful in both manual and automated debugging. However, such techniques prove ineffective scenarios where executing specific inputs prohibitively expensive, or even impossible, with partial code. In this paper, we introduce ND-Slicer, predictive methodology caters to executions based on particular input, overcoming need for actual execution. We enable...

10.1145/3643739 article EN Proceedings of the ACM on software engineering. 2024-07-12

To avoid the exposure of original source code, variable names deployed in wild are often replaced by short, meaningless names, thus making code difficult to understand and be analyzed. We introduce DeMinify, a Deep-Learning (DL)-based approach that formulates such recovery problem as prediction missing features Graph Convolutional Network–Missing Features. The graph represents both relations among variables their types, which or types some nodes missing. Moreover, DeMinify leverages...

10.1145/3611643.3616368 article EN 2023-11-30

Stakeholders play critical roles in requirements elicitation, since they are the source of requirements, and quality elicited is significantly influenced by degree stakeholders' participation collaboration elicitation. However, elicitation often obstructed due to diversity background interests, especially different perspectives on envisioned systems, insufficient communication common-understanding among them, abilities express requirements.

10.1145/1640206.1640228 article EN 2009-10-17

Developer forums are one of the most popular and useful Q&A websites on API usages. The analysis can be a critical step towards automated question answer approaches. In this poster, we empirically study three forums: Twitter, eBay, AdWords, to investigate characteristics question-answering process. We observe that +60% posts all were answered with method names or documentation. +85% questions by development teams answers from drew fewer follow-up questions. Our results provide empirical...

10.1145/3377812.3390897 article EN 2020-06-27
Coming Soon ...