NFDI4DS | UHH-SEMS - Publication Details

Yi Li

ORCID: 0009-0007-0143-0677

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5107249192

Research Areas

Software Engineering Research
Software Reliability and Analysis Research
Software Testing and Debugging Techniques
Topic Modeling
Advanced Malware Detection Techniques
Software System Performance and Reliability
Natural Language Processing Techniques
Multimodal Machine Learning Applications
Semantic Web and Ontologies
Advanced Graph Neural Networks
Music and Audio Processing
Software Engineering Techniques and Practices
Security and Verification in Computing
Data Quality and Management
Information and Cyber Security
Privacy-Preserving Technologies in Data
Advanced Text Analysis Techniques
Genomics and Phylogenetic Studies
Human Motion and Animation
Adversarial Robustness in Machine Learning
Educational Technology and Assessment
Speech and Audio Processing
Higher Education and Teaching Methods
EFL/ESL Teaching and Learning
Advanced Software Engineering Methodologies

New Jersey Institute of Technology
2019-2024

Xi'an Technological University
2022-2023

Beijing Language and Culture University
2019-2021

Tsinghua University
2021

Google (United States)
2021

University of Chinese Academy of Sciences
2019

Center for Excellence in Brain Science and Intelligence Technology
2019

Central South University
2018

Yunnan Normal University
2011

Yunnan University
2011

A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries

OPENALEX - Publications

Jiahao Fan Yi Li Shaohua Wang Tien N. Nguyen

We collected a large C/C++ code vulnerability dataset from open-source Github projects, namely Big-Vul. crawled the public Common Vulnerabilities and Exposures (CVE) database CVE-related source repositories. Specifically, we descriptive information of vulnerabilities CVE database, e.g., IDs, severity scores, summaries. With its related published repository links, downloaded all repositories extracted changes. In total, Big-Vul contains 3,754 spanning 91 different types. All these are 348...

10.1145/3379597.3387501 article EN 2020-06-29

DLFix

OPENALEX - Publications

Yi Li Shaohua Wang Tien N. Nguyen

Automated Program Repair (APR) is very useful in helping developers the process of software development and maintenance. Despite recent advances deep learning (DL), DL-based APR approaches still have limitations bug-fixing code changes context surrounding source changes. These lead to incorrect fixing locations or fixes. In this paper, we introduce DLFix, a two-tier DL model that treats as transformation from prior bug fixes contexts The first layer tree-based RNN learns its result used an...

10.1145/3377811.3380345 article EN 2020-06-27

Improving bug detection via context-based code representation learning and attention-based neural networks

OPENALEX - Publications

Yi Li Shaohua Wang Tien N. Nguyen Son Nguyen

Bug detection has been shown to be an effective way help developers in detecting bugs early, thus, saving much effort and time software development process. Recently, deep learning-based bug approaches have gained successes over the traditional machine approaches, rule-based program analysis mining-based approaches. However, they are still limited that involve multiple methods suffer high rate of false positives. In this paper, we propose a combination approach with use contexts attention...

10.1145/3360588 article EN Proceedings of the ACM on Programming Languages 2019-10-10

Fault Localization with Code Coverage Representation Learning

OPENALEX - Publications

Yi Li Shaohua Wang Tien Dzung Nguyen

In this paper, we propose DeepRL4FL, a deep learning fault localization (FL) approach that locates the buggy code at statement and method levels by treating FL as an image pattern recognition problem. DeepRL4FL does so via novel coverage representation (RL) data dependencies RL for program statements. Those two types of on dynamic information in matrix are also combined with static usual suspicious source code. This combination is inspired crime scene investigation which investigators...

10.1109/icse43902.2021.00067 article EN 2021-05-01

DEAR

OPENALEX - Publications

Yi Li Shaohua Wang Tien N. Nguyen

The existing deep learning (DL)-based automated program repair (APR) models are limited in fixing general software defects. We present DEAR, a DL-based approach that supports for the bugs require dependent changes at once to one or multiple consecutive statements hunks of code. first design novel fault localization (FL) technique multi-hunk, multi-statement fixes combines traditional spectrum-based (SB) FL with and data-flow analysis. It takes buggy returned by SBFL model, detects be fixed...

10.1145/3510003.3510177 article EN Proceedings of the 44th International Conference on Software Engineering 2022-05-21

Contrastive Multimodal Fusion with TupleInfoNCE

OPENALEX - Publications

Yunze Liu Qingnan Fan Shanghang Zhang Hao Dong Thomas Funkhouser and 1 more

This paper proposes a method for representation learning of multimodal data using contrastive losses. A traditional approach is to contrast different modalities learn the information shared among them. However, that could fail complementary synergies between might be useful downstream tasks. Another concatenate all into tuple and then positive negative correspondences. consider only stronger while ignoring weaker ones. To address these issues, we propose novel objective, TupleInfoNCE. It...

10.1109/iccv48922.2021.00079 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

A Context-Based Automated Approach for Method Name Consistency Checking and Suggestion

OPENALEX - Publications

Yi Li Shaohua Wang Tien Dzung Nguyen

Misleading method names in software projects can confuse developers, which may lead to defects and affect code understandability. In this paper, we present DeepName, a context-based, deep learning approach detect name inconsistencies suggest proper for method. The key departure point is the philosophy of "Show Me Your Friends, I'll Tell You Who Are". Unlike state-of-the-art approaches, addition method's body, also consider interactions current under study with other ones including caller...

10.1109/icse43902.2021.00060 article EN 2021-05-01

DreamLLM: Synergistic Multimodal Comprehension and Creation

OPENALEX - Publications

Runpei Dong Chunrui Han Yuang Peng Zekun Qi Zheng Ge and 9 more

This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation. DreamLLM operates on two fundamental principles. The focuses the generative modeling of both language image posteriors by direct sampling in raw space. approach circumvents limitations information loss inherent to external feature extractors like CLIP, more thorough understanding...

10.48550/arxiv.2309.11499 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01

DeepVD: Toward Class-Separation Features for Neural Network Vulnerability Detection

OPENALEX - Publications

Wenbo Wang Tien N. Nguyen Shaohua Wang Yi Li Jiyuan Zhang and 1 more

The advances of machine learning (ML) including deep (DL) have enabled several approaches to implicitly learn vulnerable code patterns automatically detect software vulnerabilities. A recent study showed that despite successes, the existing ML/DL-based vulnerability detection (VD) models are limited in ability distinguish between two classes and benign code. We propose DeepVD, a graph-based neural network VD model emphasizes on class-separation features DeepVDleverages three types at...

10.1109/icse48619.2023.00189 article EN 2023-05-01

Combining Program Analysis and Statistical Language Model for Code Statement Completion

OPENALEX - Publications

Son Nguyen Tien N. Nguyen Yi Li Shaohua Wang

Automatic code completion helps improve developers' productivity in their programming tasks. A program contains instructions expressed via statements, which are considered as the basic units of execution. In this paper, we introduce AutoSC, combines analysis and principle software naturalness to fill a partially completed statement. AutoSC benefits from strengths both directions, statement is frequent valid. first trained on large corpus derive templates candidate statements. Then, it uses...

10.1109/ase.2019.00072 article EN 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2019-11-01

Does data sampling improve deep learning-based vulnerability detection? Yeas! and Nays!

OPENALEX - Publications

Xu Yang Shaowei Wang Yi Li Shaohua Wang

Recent progress in Deep Learning (DL) has sparked interest using DL to detect software vulnerabilities automatically and it been demonstrated promising results at detecting vulnerabilities. However, one prominent practical issue for vulnerability detection is data imbalance. Prior study observed that the performance of state-of-the-art (SOTA) DL-based (DLVD) approaches drops precipitously real world imbalanced a 73% drop F1-score on average across studied approaches. Such significant can...

10.1109/icse48619.2023.00192 article EN 2023-05-01

UTANGO: untangling commits with context-aware, graph-based, code change clustering learning model

OPENALEX - Publications

Yi Li Shaohua Wang Tien N. Nguyen

During software evolution, developers make several changes and commit them into the repositories. Unfortunately, many of tangle different purposes, both hampering program comprehension reducing separation concerns. Automated approaches with deterministic solutions have been proposed to untangle commits. However, specifying an effective clustering criteria on in a for untangling is challenging those approaches. In this work, we present UTango, machine learning (ML)-based approach that learns...

10.1145/3540250.3549171 article EN Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering 2022-11-07

Commit-Level, Neural Vulnerability Detection and Assessment

OPENALEX - Publications

Yi Li Aashish Yadavally Jiaxing Zhang Shaohua Wang Tien N. Nguyen

Software Vulnerabilities (SVs) are security flaws that exploitable in cyber-attacks. Delay the detection and assessment of SVs might cause serious consequences due to unknown impacts on attacked systems. The state-of-the-art approaches have been proposed work directly committed code changes for early detection. However, none them could provide both commit-level vulnerability at once. Moreover, still suffer low accuracy limited representations surrounding contexts.

10.1145/3611643.3616346 article EN 2023-11-30

A novel ranking framework for linked data from relational databases

OPENALEX - Publications

Jing Zhang Chune Ma Chenting Zhao Jun Zhang Yi Li and 1 more

This paper investigates the problem of ranking linked data from relational databases using a framework. The core idea is to group relationships by their types, then rank and finally instances attached each type. criteria for step considers mapping rules heterogeneous graph structure web. Tests based on social network dataset show that effective easier people understand. approach benefits utilizing deduced table schemas distinguishing relationship which results in better andvisualization data.

10.1016/s1007-0214(10)70111-5 article EN Tsinghua Science & Technology 2010-12-01

A Learning-Based Approach to Static Program Slicing

OPENALEX - Publications

Aashish Yadavally Yi Li Shaohua Wang Tien N. Nguyen

Traditional program slicing techniques are crucial for early bug detection and manual/automated debugging of online code snippets. Nevertheless, their inability to handle incomplete hinders real-world applicability in such scenarios. To overcome these challenges, we present NS-Slicer, a novel learning-based approach that predicts static slices both complete partial Our tool leverages pre-trained language model exploit its understanding fine-grained variable-statement dependencies within...

10.1145/3649814 article EN Proceedings of the ACM on Programming Languages 2024-04-29

Neural Exception Handling Recommender

OPENALEX - Publications

Yi Li Tien N. Nguyen Yuchen Cai Aashish Yadavally Abhishek Mishra and 1 more

Practical code reuse often leads to the incorporation of fragments from developer forums into applications. However, these fragments, being incomplete, frequently lack details on exception handling. Integrating handling a codebase is not straightforward task, requiring developers understand and remember which API methods may trigger exceptions should be handled. To address that, we introduce EHBlock, learning-based recommender for Java snippets. EHBlock analyzes given snippet suggests...

10.1145/3639478.3643082 article EN cc-by 2024-04-14

Poirot: Deep Learning for API Misuse Detection

OPENALEX - Publications

Yi Li Tien N. Nguyen Shaohua Wang Aashish Yadavally

API misuses refer to incorrect usages that violate the usage constraints of elements, potentially leading issues such as runtime errors, exceptions, program crashes, and security vulnerabilities. Existing mining-based approaches for misuse detection face challenges in accuracy, particularly distinguishing infrequent from invalid usage. This limitation stems necessity set predefined thresholds frequent patterns, resulting potential misclassification alternative usages. paper introduces...

10.1145/3639478.3643080 article EN cc-by 2024-04-14

Predictive Program Slicing via Execution Knowledge-Guided Dynamic Dependence Learning

OPENALEX - Publications

Aashish Yadavally Yi Li Tien N. Nguyen

Program slicing, the process of extracting program statements that influence values at a designated location (known as slicing criterion), is helpful in both manual and automated debugging. However, such techniques prove ineffective scenarios where executing specific inputs prohibitively expensive, or even impossible, with partial code. In this paper, we introduce ND-Slicer, predictive methodology caters to executions based on particular input, overcoming need for actual execution. We enable...

10.1145/3643739 article EN Proceedings of the ACM on software engineering. 2024-07-12

DeMinify: Neural Variable Name Recovery and Type Inference

OPENALEX - Publications

Yi Li Aashish Yadavally Jiaxing Zhang Shaohua Wang Tien N. Nguyen

To avoid the exposure of original source code, variable names deployed in wild are often replaced by short, meaningless names, thus making code difficult to understand and be analyzed. We introduce DeMinify, a Deep-Learning (DL)-based approach that formulates such recovery problem as prediction missing features Graph Convolutional Network–Missing Features. The graph represents both relations among variables their types, which or types some nodes missing. Moreover, DeMinify leverages...

10.1145/3611643.3616368 article EN 2023-11-30

A problem-driven scenario-based approach to collaborative requirement elicitation

OPENALEX - Publications

Haiyan Zhao Yi Li Wei Zhang Hong Mei

Stakeholders play critical roles in requirements elicitation, since they are the source of requirements, and quality elicited is significantly influenced by degree stakeholders' participation collaboration elicitation. However, elicitation often obstructed due to diversity background interests, especially different perspectives on envisioned systems, insufficient communication common-understanding among them, abilities express requirements.

10.1145/1640206.1640228 article EN 2009-10-17

An empirical study on the characteristics of question-answering process on developer forums

OPENALEX - Publications

Yi Li Shaohua Wang Tien N. Nguyen

Developer forums are one of the most popular and useful Q&A websites on API usages. The analysis can be a critical step towards automated question answer approaches. In this poster, we empirically study three forums: Twitter, eBay, AdWords, to investigate characteristics question-answering process. We observe that +60% posts all were answered with method names or documentation. +85% questions by development teams answers from drew fewer follow-up questions. Our results provide empirical...

10.1145/3377812.3390897 article EN 2020-06-27

Coming Soon ...