NFDI4DS | UHH-SEMS - Publication Details

Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks

OPENALEX - Publications

Antonio Mastropaolo Simone Scalabrino Nathan Cooper David N. Palacio Denys Poshyvanyk and 2 more

Deep learning (DL) techniques are gaining more and attention in the software engineering community. They have been used to support several code-related tasks, such as automatic bug fixing code comments generation. Recent studies Natural Language Processing (NLP) field shown that Text-To-Text Transfer Transformer (T5) architecture can achieve state-of-the-art performance for a variety of NLP tasks. The basic idea behind T5 is first pre-train model on large generic dataset using...

10.1109/icse43902.2021.00041 article EN 2021-05-01

A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research

OPENALEX - Publications

Cody Watson Nathan Cooper David Nader Palacio Kevin Moran Denys Poshyvanyk

An increasingly popular set of techniques adopted by software engineering (SE) researchers to automate development tasks are those rooted in the concept Deep Learning (DL). The popularity such largely stems from their automated feature capabilities, which aid modeling artifacts. However, due rapid pace at DL have been adopted, it is difficult distill current successes, failures, and opportunities research landscape. In an effort bring clarity this cross-cutting area work, its modern...

10.1145/3485275 article EN ACM Transactions on Software Engineering and Methodology 2022-03-04

An Empirical Study on the Usage of Transformer Models for Code Completion

OPENALEX - Publications

Matteo Ciniselli Nathan Cooper Luca Pascarella Antonio Mastropaolo Emad Aghajani and 3 more

Code completion aims at speeding up code writing by predicting the next token(s) developer is likely to write. Works in this field focused on improving accuracy of generated predictions, with substantial leaps forward made possible deep learning (DL) models. However, techniques are mostly evaluated scenario token type, few exceptions pushing boundaries prediction an entire statement. Thus, little known about performance state-of-the-art approaches more challenging scenarios which, for...

10.1109/tse.2021.3128234 article EN publisher-specific-oa IEEE Transactions on Software Engineering 2021-01-01

Using Transfer Learning for Code-Related Tasks

OPENALEX - Publications

Antonio Mastropaolo Nathan Cooper David N. Palacio Simone Scalabrino Denys Poshyvanyk and 2 more

Deep learning (DL) techniques have been used to support several code-related tasks such as code summarization and bug-fixing. In particular, pre-trained transformer models are on the rise, also thanks excellent results they achieved in Natural Language Processing (NLP) tasks. The basic idea behind these is first pre-train them a generic dataset using self-supervised task (e.g., filling masked words sentences). Then, fine-tuned specific of interest language translation). A single model can be...

10.1109/tse.2022.3183297 article EN IEEE Transactions on Software Engineering 2022-06-15

An Empirical Study on the Usage of BERT Models for Code Completion

OPENALEX - Publications

Matteo Ciniselli Nathan Cooper Luca Pascarella Denys Poshyvanyk Massimiliano Di Penta and 1 more

Code completion is one of the main features modern Integrated Development Environments (IDEs). Its objective to speed up code writing by predicting next token(s) developer likely write. Research in this area has substantially bolstered predictive performance these techniques. However, support developers still limited prediction few tokens type. In work, we take a step further direction presenting large-scale empirical study aimed at exploring capabilities state-of-the-art deep learning (DL)...

10.1109/msr52588.2021.00024 article EN 2021-05-01

Translating video recordings of mobile app usages into replayable scenarios

OPENALEX - Publications

Carlos Bernal-Cárdenas Nathan Cooper Kevin Moran Oscar Chaparro Andrian Marcus and 1 more

Screen recordings of mobile applications are easy to obtain and capture a wealth information pertinent software developers (e.g., bugs or feature requests), making them popular mechanism for crowdsourced app feedback. Thus, these videos becoming common artifact that must manage. In light unique development constraints, including swift release cycles rapidly evolving platforms, automated techniques analyzing all types rich artifacts provide benefit developers. Unfortunately, automatically...

10.1145/3377811.3380328 preprint EN 2020-06-27

Stable LM 2 1.6B Technical Report

OPENALEX - Publications

Marco Bellagente Jonathan Tow Dakota Mahan Duy Phung Maksym Zhuravinskyi and 14 more

We introduce StableLM 2 1.6B, the first in a new generation of our language model series. In this technical report, we present detail data and training procedure leading to base instruction-tuned versions 1.6B. The weights for both models are available via Hugging Face anyone download use. report contains thorough evaluations these models, including zero- few-shot benchmarks, multilingual MT benchmark focusing on multi-turn dialogues. At time publishing 1.6B was state-of-the-art open under...

10.48550/arxiv.2402.17834 preprint EN arXiv (Cornell University) 2024-02-27

It Takes Two to Tango: Combining Visual and Textual Information for Detecting Duplicate Video-Based Bug Reports

OPENALEX - Publications

Nathan Cooper Carlos Bernal-Cárdenas Oscar Chaparro Kevin Moran Denys Poshyvanyk

When a bug manifests in user-facing application, it is likely to be exposed through the graphical user interface (GUI). Given importance of visual information process identifying and understanding such bugs, users are increasingly making use screenshots screen-recordings as means report issues developers. However, when reported en masse, during crowd-sourced testing, managing these artifacts can time-consuming process. As reporting particular becomes more popular, developers face challenges...

10.1109/icse43902.2021.00091 article EN 2021-05-01

On the Generalizability of Transformer Models to Code Completions of Different Lengths

OPENALEX - Publications

Nathan Cooper Rosalia Tufano Gabriele Bavota Denys Poshyvanyk

The programming landscape is nowadays being reshaped by the advent of Large Language Models (LLMs) able to automate code-related tasks related code implementation (e.g., completion) and comprehension summarization). Such a paradigm shift comes with number implications how software will be written, maintained, evolved. Also, these LLMs are extremely expensive train, posing questions on their sustainability over time. Given training cost, ability generalize, namely work task instances...

10.48550/arxiv.2501.05051 preprint EN arXiv (Cornell University) 2025-01-09

Why Crypto-detectors Fail: A Systematic Evaluation of Cryptographic Misuse Detection Techniques

OPENALEX - Publications

Amit Seal Ami Nathan Cooper Kaushal Kafle Kevin Moran Denys Poshyvanyk and 1 more

The correct use of cryptography is central to ensuring data security in modern software systems. Hence, several academic and commercial static analysis tools have been developed for detecting mitigating crypto-API misuse. While developers are optimistically adopting these misuse detectors (or crypto-detectors) their development cycles, this momentum must be accompanied by a rigorous understanding effectiveness at finding practice. This paper presents the MASC framework, which enables...

10.1109/sp46214.2022.9833582 article EN 2022 IEEE Symposium on Security and Privacy (SP) 2022-05-01

Toward a Theory of Causation for Interpreting Neural Code Models

OPENALEX - Publications

David N. Palacio Alejandro Velasco Nathan Cooper Álvaro Rodríguez Kevin Moran and 1 more

Neural Language Models of Code, or Code (NCMs), are rapidly progressing from research prototypes to commercial developer tools. As such, understanding the capabilities and limitations such models is becoming critical. However, abilities these typically measured using automated metrics that often only reveal a portion their real-world performance. While, in general, performance NCMs appears promising, currently much unknown about how arrive at decisions. To this end, paper introduces <italic...

10.1109/tse.2024.3379943 article EN IEEE Transactions on Software Engineering 2024-03-21

Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug Reports

OPENALEX - Publications

Yanfu Yan Nathan Cooper Oscar Chaparro Kevin Moran Denys Poshyvanyk

Video-based bug reports are increasingly being used to document bugs for programs centered around a graphical user interface (GUI). However, developing automated techniques manage video-based is challenging as it requires identifying and understanding often nuanced visual patterns that capture key information about reported bug. In this paper, we aim overcome these challenges by advancing the report management task of duplicate detection reports. To end, introduce new approach, called Janus,...

10.1145/3597503.3639163 article EN cc-by 2024-04-12

Toward a Theory of Causation for Interpreting Neural Code Models

OPENALEX - Publications

David N. Palacio Nathan Cooper Álvaro Rodríguez Kevin Moran Denys Poshyvanyk

Neural Language Models of Code, or Code (NCMs), are rapidly progressing from research prototypes to commercial developer tools. As such, understanding the capabilities and limitations such models is becoming critical. However, abilities these typically measured using automated metrics that often only reveal a portion their real-world performance. While, in general, performance NCMs appears promising, currently much unknown about how arrive at decisions. To this end, paper introduces...

10.48550/arxiv.2302.03788 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

Translating Video Recordings of Complex Mobile App UI Gestures into Replayable Scenarios

OPENALEX - Publications

Carlos Bernal-Cárdenas Nathan Cooper Madeleine Havranek Kevin Moran Oscar Chaparro and 2 more

Screen recordings of mobile applications are easy to obtain and capture a wealth information pertinent software developers (e.g., bugs or feature requests), making them popular mechanism for crowdsourced app feedback. Thus, these videos becoming common artifact that must manage. In light unique development constraints, including swift release cycles rapidly evolving platforms, automated techniques analyzing all types rich artifacts provide benefit developers. Unfortunately, automatically...

10.1109/tse.2022.3192279 article EN IEEE Transactions on Software Engineering 2022-07-25

Can We Automatically Fix Bugs by Learning Edit Operations?

OPENALEX - Publications

Aidan Connor Aaron M. Harris Nathan Cooper Denys Poshyvanyk

There has been much work done in the area of automated program repair, specifically through using machine learning methods to correct buggy code. Whereas some degree success attained by those efforts, there is still considerable room for growth with regard accuracy results produced such tools. In that vein, we implement Hephaestus, a novel method improve bug repair apply edit operations. Hephaestus leverages neural translation and attempts produce operations needed given code segment fixed...

10.1109/saner53432.2022.00096 article EN 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2022-03-01

Security for Devops Deployment Processes: Defenses, Risks, Research Directions

OPENALEX - Publications

Norman Wilde Brian P. Eddy Khyati Patel Nathan Cooper Valeria Gamboa and 2 more

DevOps is an emerging collection of software management practices intended to shorten time market for new features and reduce the risk costly deployment errors.In this paper we examine security implications two key practices, automation pipeline using a toolchain infrastructure-as-code specify environment deployed software.We focus on identifying what changes when organization moves from manual deployments automated processes.We reviewed literature conducted three case studies simple...

10.5121/ijsea.2016.7601 article EN International Journal of Software Engineering & Applications 2016-11-30

V2S: A Tool for Translating Video Recordings of Mobile App Usages into Replayable Scenarios

OPENALEX - Publications

Madeleine Havranek Carlos Bernal-Cárdenas Nathan Cooper Oscar Chaparro Denys Poshyvanyk and 1 more

Screen recordings are becoming increasingly important as rich software artifacts that inform mobile application development processes. However, the amount of manual effort required to extract information from these graphical can hinder resource-constrained developers. This paper presents Video2Scenario (V2S), an automated tool processes video Android app usages, utilizes neural object detection and image classification techniques classify depicted user actions, translates actions into a...

10.1109/icse-companion52605.2021.00037 article EN 2021-05-01

MASC: A Tool for Mutation-Based Evaluation of Static Crypto-API Misuse Detectors

OPENALEX - Publications

Amit Seal Ami Syed Yusuf Ahmed Radowan Mahmud Redoy Nathan Cooper Kaushal Kafle and 3 more

While software engineers are optimistically adopting crypto-API misuse detectors (or crypto-detectors) in their development cycles, this momentum must be accompanied by a rigorous understanding of crypto-detectors' effectiveness at finding misuses practice. This demo paper presents the technical details and usage scenarios our tool, namely Mutation Analysis for evaluating Static Crypto-API (MASC). We developed $12$ generalizable, based mutation operators three scopes, Main Scope, Similarity...

10.1145/3611643.3613099 preprint EN cc-by-nc-sa 2023-11-30

A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research

OPENALEX - Publications

Cody Watson Nathan Cooper David N. Palacio Kevin Moran Denys Poshyvanyk

An increasingly popular set of techniques adopted by software engineering (SE) researchers to automate development tasks are those rooted in the concept Deep Learning (DL). The popularity such largely stems from their automated feature capabilities, which aid modeling artifacts. However, due rapid pace at DL have been adopted, it is difficult distill current successes, failures, and opportunities research landscape. In an effort bring clarity this crosscutting area work, its modern inception...

10.48550/arxiv.2009.06520 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Using Transfer Learning for Code-Related Tasks

OPENALEX - Publications

Antonio Mastropaolo Nathan Cooper David N. Palacio Simone Scalabrino Denys Poshyvanyk and 2 more

Deep learning (DL) techniques have been used to support several code-related tasks such as code summarization and bug-fixing. In particular, pre-trained transformer models are on the rise, also thanks excellent results they achieved in Natural Language Processing (NLP) tasks. The basic idea behind these is first pre-train them a generic dataset using self-supervised task (e.g, filling masked words sentences). Then, fine-tuned specific of interest language translation). A single model can be...

10.48550/arxiv.2206.08574 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Stable Code Technical Report

OPENALEX - Publications

Nikhil Pinnaparaju Reshinth Adithyan Duy Phung Jonathan Tow James Baicoianu and 6 more

We introduce Stable Code, the first in our new-generation of code language models series, which serves as a general-purpose base model targeting completion, reasoning, math, and other software engineering-based tasks. Additionally, we an instruction variant named Code Instruct that allows conversing with natural chat interface for performing question-answering instruction-based In this technical report, detail data training procedure leading to both models. Their weights are available via...

10.48550/arxiv.2404.01226 preprint EN arXiv (Cornell University) 2024-04-01

Enhancing Code Understanding for Impact Analysis by Combining Transformers and Program Dependence Graphs

OPENALEX - Publications

Yanfu Yan Nathan Cooper Kevin Moran Gabriele Bavota Denys Poshyvanyk and 1 more

Impact analysis (IA) is a critical software maintenance task that identifies the effects of given set code changes on larger project with intention avoiding potential adverse effects. IA cognitively challenging involves reasoning about abstract relationships between various constructs. Given its difficulty, researchers have worked to automate approaches primarily use coupling metrics as measure "connectedness" different parts project. Many these rely static, dynamic, or evolutionary...

10.1145/3643770 article EN Proceedings of the ACM on software engineering. 2024-07-12

Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug Reports

OPENALEX - Publications

Yanfu Yan Nathan Cooper Oscar Chaparro Kevin Moran Denys Poshyvanyk

Video-based bug reports are increasingly being used to document bugs for programs centered around a graphical user interface (GUI). However, developing automated techniques manage video-based is challenging as it requires identifying and understanding often nuanced visual patterns that capture key information about reported bug. In this paper, we aim overcome these challenges by advancing the report management task of duplicate detection reports. To end, introduce new approach, called JANUS,...

10.48550/arxiv.2407.08610 preprint EN arXiv (Cornell University) 2024-07-11

Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training

OPENALEX - Publications

Michael Pieler Marco Bellagente Hannah Teufel Duy Phung Nathan Cooper and 7 more

Recently published work on rephrasing natural text data for pre-training LLMs has shown promising results when combining the original dataset with synthetically rephrased data. We build upon previous by replicating existing C4 and extending them our optimized pipeline to English, German, Italian, Spanish Oscar subsets of CulturaX. Our leads increased performance standard evaluation benchmarks in both mono- multilingual setup. In addition, we provide a detailed study pipeline, investigating...

10.48550/arxiv.2410.20796 preprint EN arXiv (Cornell University) 2024-10-28

On the Generalizability of Transformer Models to Code Completions of Different Lengths

OPENALEX - Publications

Nathan Cooper Rosalia Tufano Gabriele Bavota Denys Poshyvanyk

10.1109/icsme58944.2024.00042 article EN 2024-10-06