- Software Engineering Research
- Software Testing and Debugging Techniques
- Advanced Malware Detection Techniques
- Web Data Mining and Analysis
- Topic Modeling
- Software Reliability and Analysis Research
- Natural Language Processing Techniques
- Software Engineering Techniques and Practices
- Mobile and Web Applications
- Advanced Electrical Measurement Techniques
- Power Transformer Diagnostics and Insulation
- Neural Networks and Applications
- Particle accelerators and beam dynamics
- Web Application Security Vulnerabilities
- Technology Assessment and Management
- Adversarial Robustness in Machine Learning
- Simulation Techniques and Applications
- Microwave Engineering and Waveguides
- Software System Performance and Reliability
- Systems Engineering Methodologies and Applications
- Text Readability and Simplification
- Induction Heating and Inverter Technology
William & Mary
2020-2024
Williams (United States)
2021-2024
Universidad Nacional de Colombia
2023
George Mason University
2023
University of Basilicata
2021
Università della Svizzera italiana
2021
University of Sannio
2021
University of West Florida
2016
Deep learning (DL) techniques are gaining more and attention in the software engineering community. They have been used to support several code-related tasks, such as automatic bug fixing code comments generation. Recent studies Natural Language Processing (NLP) field shown that Text-To-Text Transfer Transformer (T5) architecture can achieve state-of-the-art performance for a variety of NLP tasks. The basic idea behind T5 is first pre-train model on large generic dataset using...
An increasingly popular set of techniques adopted by software engineering (SE) researchers to automate development tasks are those rooted in the concept Deep Learning (DL). The popularity such largely stems from their automated feature capabilities, which aid modeling artifacts. However, due rapid pace at DL have been adopted, it is difficult distill current successes, failures, and opportunities research landscape. In an effort bring clarity this cross-cutting area work, its modern...
Code completion aims at speeding up code writing by predicting the next token(s) developer is likely to write. Works in this field focused on improving accuracy of generated predictions, with substantial leaps forward made possible deep learning (DL) models. However, techniques are mostly evaluated scenario token type, few exceptions pushing boundaries prediction an entire statement. Thus, little known about performance state-of-the-art approaches more challenging scenarios which, for...
Deep learning (DL) techniques have been used to support several code-related tasks such as code summarization and bug-fixing. In particular, pre-trained transformer models are on the rise, also thanks excellent results they achieved in Natural Language Processing (NLP) tasks. The basic idea behind these is first pre-train them a generic dataset using self-supervised task (e.g., filling masked words sentences). Then, fine-tuned specific of interest language translation). A single model can be...
Code completion is one of the main features modern Integrated Development Environments (IDEs). Its objective to speed up code writing by predicting next token(s) developer likely write. Research in this area has substantially bolstered predictive performance these techniques. However, support developers still limited prediction few tokens type. In work, we take a step further direction presenting large-scale empirical study aimed at exploring capabilities state-of-the-art deep learning (DL)...
Screen recordings of mobile applications are easy to obtain and capture a wealth information pertinent software developers (e.g., bugs or feature requests), making them popular mechanism for crowdsourced app feedback. Thus, these videos becoming common artifact that must manage. In light unique development constraints, including swift release cycles rapidly evolving platforms, automated techniques analyzing all types rich artifacts provide benefit developers. Unfortunately, automatically...
We introduce StableLM 2 1.6B, the first in a new generation of our language model series. In this technical report, we present detail data and training procedure leading to base instruction-tuned versions 1.6B. The weights for both models are available via Hugging Face anyone download use. report contains thorough evaluations these models, including zero- few-shot benchmarks, multilingual MT benchmark focusing on multi-turn dialogues. At time publishing 1.6B was state-of-the-art open under...
When a bug manifests in user-facing application, it is likely to be exposed through the graphical user interface (GUI). Given importance of visual information process identifying and understanding such bugs, users are increasingly making use screenshots screen-recordings as means report issues developers. However, when reported en masse, during crowd-sourced testing, managing these artifacts can time-consuming process. As reporting particular becomes more popular, developers face challenges...
The programming landscape is nowadays being reshaped by the advent of Large Language Models (LLMs) able to automate code-related tasks related code implementation (e.g., completion) and comprehension summarization). Such a paradigm shift comes with number implications how software will be written, maintained, evolved. Also, these LLMs are extremely expensive train, posing questions on their sustainability over time. Given training cost, ability generalize, namely work task instances...
The correct use of cryptography is central to ensuring data security in modern software systems. Hence, several academic and commercial static analysis tools have been developed for detecting mitigating crypto-API misuse. While developers are optimistically adopting these misuse detectors (or crypto-detectors) their development cycles, this momentum must be accompanied by a rigorous understanding effectiveness at finding practice. This paper presents the MASC framework, which enables...
Neural Language Models of Code, or Code (NCMs), are rapidly progressing from research prototypes to commercial developer tools. As such, understanding the capabilities and limitations such models is becoming critical. However, abilities these typically measured using automated metrics that often only reveal a portion their real-world performance. While, in general, performance NCMs appears promising, currently much unknown about how arrive at decisions. To this end, paper introduces <italic...
Video-based bug reports are increasingly being used to document bugs for programs centered around a graphical user interface (GUI). However, developing automated techniques manage video-based is challenging as it requires identifying and understanding often nuanced visual patterns that capture key information about reported bug. In this paper, we aim overcome these challenges by advancing the report management task of duplicate detection reports. To end, introduce new approach, called Janus,...
Neural Language Models of Code, or Code (NCMs), are rapidly progressing from research prototypes to commercial developer tools. As such, understanding the capabilities and limitations such models is becoming critical. However, abilities these typically measured using automated metrics that often only reveal a portion their real-world performance. While, in general, performance NCMs appears promising, currently much unknown about how arrive at decisions. To this end, paper introduces...
Screen recordings of mobile applications are easy to obtain and capture a wealth information pertinent software developers (e.g., bugs or feature requests), making them popular mechanism for crowdsourced app feedback. Thus, these videos becoming common artifact that must manage. In light unique development constraints, including swift release cycles rapidly evolving platforms, automated techniques analyzing all types rich artifacts provide benefit developers. Unfortunately, automatically...
There has been much work done in the area of automated program repair, specifically through using machine learning methods to correct buggy code. Whereas some degree success attained by those efforts, there is still considerable room for growth with regard accuracy results produced such tools. In that vein, we implement Hephaestus, a novel method improve bug repair apply edit operations. Hephaestus leverages neural translation and attempts produce operations needed given code segment fixed...
DevOps is an emerging collection of software management practices intended to shorten time market for new features and reduce the risk costly deployment errors.In this paper we examine security implications two key practices, automation pipeline using a toolchain infrastructure-as-code specify environment deployed software.We focus on identifying what changes when organization moves from manual deployments automated processes.We reviewed literature conducted three case studies simple...
Screen recordings are becoming increasingly important as rich software artifacts that inform mobile application development processes. However, the amount of manual effort required to extract information from these graphical can hinder resource-constrained developers. This paper presents Video2Scenario (V2S), an automated tool processes video Android app usages, utilizes neural object detection and image classification techniques classify depicted user actions, translates actions into a...
While software engineers are optimistically adopting crypto-API misuse detectors (or crypto-detectors) in their development cycles, this momentum must be accompanied by a rigorous understanding of crypto-detectors' effectiveness at finding misuses practice. This demo paper presents the technical details and usage scenarios our tool, namely Mutation Analysis for evaluating Static Crypto-API (MASC). We developed $12$ generalizable, based mutation operators three scopes, Main Scope, Similarity...
An increasingly popular set of techniques adopted by software engineering (SE) researchers to automate development tasks are those rooted in the concept Deep Learning (DL). The popularity such largely stems from their automated feature capabilities, which aid modeling artifacts. However, due rapid pace at DL have been adopted, it is difficult distill current successes, failures, and opportunities research landscape. In an effort bring clarity this crosscutting area work, its modern inception...
Deep learning (DL) techniques have been used to support several code-related tasks such as code summarization and bug-fixing. In particular, pre-trained transformer models are on the rise, also thanks excellent results they achieved in Natural Language Processing (NLP) tasks. The basic idea behind these is first pre-train them a generic dataset using self-supervised task (e.g, filling masked words sentences). Then, fine-tuned specific of interest language translation). A single model can be...
We introduce Stable Code, the first in our new-generation of code language models series, which serves as a general-purpose base model targeting completion, reasoning, math, and other software engineering-based tasks. Additionally, we an instruction variant named Code Instruct that allows conversing with natural chat interface for performing question-answering instruction-based In this technical report, detail data training procedure leading to both models. Their weights are available via...
Impact analysis (IA) is a critical software maintenance task that identifies the effects of given set code changes on larger project with intention avoiding potential adverse effects. IA cognitively challenging involves reasoning about abstract relationships between various constructs. Given its difficulty, researchers have worked to automate approaches primarily use coupling metrics as measure "connectedness" different parts project. Many these rely static, dynamic, or evolutionary...
Video-based bug reports are increasingly being used to document bugs for programs centered around a graphical user interface (GUI). However, developing automated techniques manage video-based is challenging as it requires identifying and understanding often nuanced visual patterns that capture key information about reported bug. In this paper, we aim overcome these challenges by advancing the report management task of duplicate detection reports. To end, introduce new approach, called JANUS,...
Recently published work on rephrasing natural text data for pre-training LLMs has shown promising results when combining the original dataset with synthetically rephrased data. We build upon previous by replicating existing C4 and extending them our optimized pipeline to English, German, Italian, Spanish Oscar subsets of CulturaX. Our leads increased performance standard evaluation benchmarks in both mono- multilingual setup. In addition, we provide a detailed study pipeline, investigating...