- Software Engineering Research
- Topic Modeling
- Software Testing and Debugging Techniques
- Natural Language Processing Techniques
- Cloud Computing and Resource Management
- Software System Performance and Reliability
- Software Reliability and Analysis Research
- Business Process Modeling and Analysis
- Scientific Computing and Data Management
- Web Data Mining and Analysis
- Distributed and Parallel Computing Systems
- Service-Oriented Architecture and Web Services
- IoT and Edge/Fog Computing
- Multimodal Machine Learning Applications
- Fullerene Chemistry and Applications
- Graphene research and applications
- Open Source Software Innovations
- Caching and Content Delivery
- Inertial Sensor and Navigation
- Boron and Carbon Nanomaterials Research
- Cloud Computing and Remote Desktop Technologies
- Artificial Intelligence in Law
- GNSS positioning and interference
- Smart Grid Security and Resilience
- Video Analysis and Summarization
Nanjing University
2016-2025
Institute of Software
2015-2024
Affiliated Hospital of Nantong University
2022
Nantong University
2022
Beijing University of Posts and Telecommunications
2015-2020
Nanjing University of Aeronautics and Astronautics
2018-2019
Nanjing University of Science and Technology
2015
Hubei University
2012
Hubei University of Science and Technology
2002
Peking University
1992
Code summarization aims to generate brief natural language descriptions for source codes. The state-of-the-art approaches follow a transformer-based encoder-decoder architecture. As the code is highly structured and follows strict grammars, its Abstract Syntax Tree (AST) widely used encoding structural information. However, ASTs are much longer than corresponding code. Existing ignore size constraint simply feed whole linearized AST into encoders. We argue that such simple process makes it...
In recent years, pre-trained language models have seen significant success in natural processing and been increasingly applied to code-related tasks. Code intelligence tasks shown promising performance with the support of code models. Pre-processing simplification methods introduced prune tokens from model’s input while maintaining task effectiveness. These improve efficiency reducing computational costs. Post-prediction provide explanations for outcomes, enhancing reliability...
Pre-trained models (PTMs) have succeeded in various software engineering (SE) tasks following the “pre-train then fine-tune” paradigm. As fully fine-tuning all parameters of PTMs can be computationally expensive, a potential solution is parameter-efficient (PEFT), which freezes while introducing extra parameters. Although PEFT methods been applied to SE tasks, researchers often focus on specific scenarios and lack comprehensive comparison from different aspects such as field, size,...
While the majority of existing pre-trained models from code learn source features such as tokens and abstract syntax trees, there are some other works that focus on learning compiler intermediate representations (IRs). Existing IR-based typically utilize IR instructions, control data flow graphs (CDFGs), call graphs, etc. However, these methods confuse variable nodes instruction in a CDFG fail to distinguish different types flows, neural networks they use capture long-distance dependencies...
To date, over 40 Automated Program Repair (APR) tools have been designed with varying bug-fixing strategies, which demonstrated to complementary performance in terms of being effective for different bug classes. Intuitively, it should be feasible improve the overall APR via assembling existing tools. Unfortunately, simply invoking all available a given can result unacceptable costs on execution as well patch validation (via expensive testing). Therefore, while is appealing, requires an...
Recent years have seen the successful application of large pre-trained models to code representation learning, resulting in substantial improvements on many code-related downstream tasks. But there are issues surrounding their SE First, majority focus pre-training only encoder Transformer. For generation tasks that addressed using with encoder-decoder architecture, however, is no reason why decoder should be left out during pre-training. Second, existing models, including state-of-the-art...
Recent years have seen the remarkable capabilities of large language models (LLMs) for code generation. Different from existing work that evaluate correctness generated by LLMs, we propose to further its efficiency. More efficient can lead higher performance and execution efficiency programs software completed LLM-assisted programming. First, LLMs on two benchmarks, HumanEval MBPP. Then, choose a set programming problems online judge platform LeetCode conduct more difficult evaluation....
Recent years have seen a rise in neural program repair systems the software engineering community, which adopt advanced deep learning techniques to automatically fix bugs. Having comprehensive understanding of existing can facilitate new improvements this area and provide practical instructions for users. However, we observe two potential weaknesses current evaluation NPR systems: ① published are trained with varying data, ② roughly evaluated through number totally fixed Questions such as...
Clinical management of subsolid nodules (SSNs) is defined by the suspicion tumor invasiveness. We sought to develop an artificial intelligent (AI) algorithm for invasiveness assessment lung adenocarcinoma manifesting as radiological SSNs. investigated performance this in classification SSNs related invasiveness.A retrospective chest computed tomography (CT) dataset 1,589 was constructed (85%) and internally test (15%) proposed AI diagnostic tool, SSNet. Diagnostic evaluated hold-out set...
Abstract Code summarization is to provide a high‐level comment for code snippet that typically describes the function and intent of given code. Recent years have seen successful application data‐driven summarization. To improve performance model, numerous approaches use abstract syntax trees (ASTs) represent structural information code, which considered by most researchers be main factor distinguishes from natural language. Then, such methods are trained on large‐scale labeled datasets...
Recommending APIs is a practical and essential feature of IDEs. Improving the accuracy API recommendations an effective way to improve coding efficiency. With success deep learning in software engineering, state-of-the-art (SOTA) performance recommendation also achieved by deep-learning-based approaches. However, existing SOTAs either only consider sequences code snippets or rely on complex operations for extracting hand-crafted features, all which have potential risks under-encoding input...
The rapid development of web service provides many opportunities for companies to migrate their business processes the Internet wider accessibility and higher collaboration efficiency. However, open, dynamic ever-changing also brings challenges in protecting these processes. There are certain process monitoring methods recently proposed ones based on state changes artifacts or places, however, they do not mention defending interactions from outer tampering, where events could be detected by...