NFDI4DS | UHH-SEMS - Publication Details

Detecting missing information in bug descriptions

OPENALEX - Publications

Oscar Chaparro Jing Lü Fiorella Zampetti Laura Moreno Massimiliano Di Penta and 3 more

Bug reports document unexpected software behaviors experienced by users. To be effective, they should allow bug triagers to easily understand and reproduce the potential reported bugs, clearly describing Observed Behavior (OB), Steps Reproduce (S2R), Expected (EB). Unfortunately, while considered extremely useful, reporters often miss such pieces of information in and, date, there is no effective way automatically check enforce their presence. We manually analyzed nearly 3k what extent OB,...

10.1145/3106237.3106285 article EN 2017-08-02

On-demand Developer Documentation

OPENALEX - Publications

Martin P. Robillard Andrian Marcus Christoph Treude Gabriele Bavota Oscar Chaparro and 9 more

We advocate for a paradigm shift in supporting the information needs of developers, centered around concept automated on-demand developer documentation. Currently, are fulfilled by asking experts or consulting Unfortunately, traditional documentation practices inefficient because of, among others, manual nature its creation and gap between creators consumers. discuss major challenges we face realizing such shift, highlight existing research that can be leveraged to this end, promote...

10.1109/icsme.2017.17 article EN 2017-09-01

BOMs Away! Inside the Minds of Stakeholders: A Comprehensive Study of Bills of Materials for Software Systems

OPENALEX - Publications

Trevor Stalnaker Nathan Wintersgill Oscar Chaparro Massimiliano Di Penta Daniel M. Germán and 1 more

Software Bills of Materials (SBOMs) have emerged as tools to facilitate the management software dependencies, vulnerabilities, licenses, and supply chain. While significant effort has been devoted increasing SBOM awareness developing formats tools, recent studies shown that SBOMs are still an early technology not yet adequately adopted in practice. Expanding on previous research, this paper reports a comprehensive study investigates current challenges stakeholders encounter when creating...

10.1145/3597503.3623347 preprint EN cc-by 2024-02-06

Translating video recordings of mobile app usages into replayable scenarios

OPENALEX - Publications

Carlos Bernal-Cárdenas Nathan Cooper Kevin Moran Oscar Chaparro Andrian Marcus and 1 more

Screen recordings of mobile applications are easy to obtain and capture a wealth information pertinent software developers (e.g., bugs or feature requests), making them popular mechanism for crowdsourced app feedback. Thus, these videos becoming common artifact that must manage. In light unique development constraints, including swift release cycles rapidly evolving platforms, automated techniques analyzing all types rich artifacts provide benefit developers. Unfortunately, automatically...

10.1145/3377811.3380328 preprint EN 2020-06-27

On the Impact of Refactoring Operations on Code Quality Metrics

OPENALEX - Publications

Oscar Chaparro Gabriele Bavota Andrian Marcus Massimiliano Di Penta

Refactorings are behavior-preserving source code transformations. While tool support exists for (semi) automatically identifying refactoring solutions, applying or not a recommended is usually up to the software developers, who have assess impact that transformation will on their system. Evaluating pros (e.g., bad smell removal) and cons side effects of change) far from trivial. We present RIPE (Refactoring Impact Prediction), technique estimates operations quality metrics. supports 12 11...

10.1109/icsme.2014.73 article EN 2014-09-01

Assessing the quality of the steps to reproduce in bug reports

OPENALEX - Publications

Oscar Chaparro Carlos Bernal-Cárdenas Jing Lu Kevin Moran Andrian Marcus and 3 more

A major problem with user-written bug reports, indicated by developers and documented researchers, is the (lack of high) quality reported steps to reproduce bugs. Low-quality lead excessive manual effort spent on triage resolution. This paper proposes Euler, an approach that automatically identifies assesses in a report, providing feedback reporters, which they can use improve report. The provided Euler was assessed external evaluators results indicate correctly identified 98% existing 58%...

10.1145/3338906.3338947 article EN 2019-08-09

Using bug descriptions to reformulate queries during text-retrieval-based bug localization

OPENALEX - Publications

Oscar Chaparro Juan Manuel Florez Andrian Marcus

10.1007/s10664-018-9672-z article EN Empirical Software Engineering 2019-01-11

It Takes Two to Tango: Combining Visual and Textual Information for Detecting Duplicate Video-Based Bug Reports

OPENALEX - Publications

Nathan Cooper Carlos Bernal-Cárdenas Oscar Chaparro Kevin Moran Denys Poshyvanyk

When a bug manifests in user-facing application, it is likely to be exposed through the graphical user interface (GUI). Given importance of visual information process identifying and understanding such bugs, users are increasingly making use screenshots screen-recordings as means report issues developers. However, when reported en masse, during crowd-sourced testing, managing these artifacts can time-consuming process. As reporting particular becomes more popular, developers face challenges...

10.1109/icse43902.2021.00091 article EN 2021-05-01

Using Observed Behavior to Reformulate Queries during Text Retrieval-based Bug Localization

OPENALEX - Publications

Oscar Chaparro Juan Manuel Florez Andrian Marcus

Text Retrieval (TR)-based approaches for bug localization rely on formulating an initial query based a report. Often, the does not return buggy software artifacts at or near top of list (i.e., it is low-quality query). In such cases, needs reformulation. Existing research supporting developers in reformulation queries focuses mostly leveraging relevance feedback from user expanding original with additional information (e.g., adding synonyms). many problem lowquality presence irrelevant terms...

10.1109/icsme.2017.100 article EN 2017-09-01

Reformulating Queries for Duplicate Bug Report Detection

OPENALEX - Publications

Oscar Chaparro Juan Manuel Florez Unnati Singh Andrian Marcus

When bugs are reported, one important task is to check if they new or were reported before. Many approaches have been proposed partially automate duplicate bug report detection, and most of them rely on text retrieval techniques, using the reports as queries. Some include additional information use complex retrieval- learning-based methods. In end, even sophisticated fail retrieve in many cases, leaving triagers their own devices. We argue that these tools should be used interactively,...

10.1109/saner.2019.8667985 article EN 2019-02-01

BEE: a tool for structuring and analyzing bug reports

OPENALEX - Publications

Yang Song Oscar Chaparro

This paper introduces BEE, a tool that automatically analyzes user-written bug reports and provides feedback to reporters developers about the system's observed behavior (OB), expected (EB), steps reproduce (S2R). BEE employs machine learning (i) detect if an issue describes bug, enhancement, or question; (ii) identify structure of descriptions by labeling sentences correspond OB, EB, S2R; (iii) when fail provide these elements. is integrated with GitHub offers public web API researchers can...

10.1145/3368089.3417928 article EN 2020-11-08

NLBSE'22 tool competition

OPENALEX - Publications

Rafael Kallis Oscar Chaparro Andrea Di Sorbo Sebastiano Panichella

We report on the organization and results of first edition Tool Competition from International Workshop Natural Language-based Software Engineering (NLBSE'22). This year, five teams submitted multiple classification models to automatically classify issue reports as bugs, enhancements, or questions. Most them are based BERT (Bidirectional Encoder Representations Transformers) were fine-tuned evaluated a benchmark dataset 800k reports. The goal competition was improve performance baseline...

10.1145/3528588.3528664 article EN 2022-05-21

Resource-Efficient & Effective Code Summarization

OPENALEX - Publications

Saima Afrin Joseph Call Khai-Nguyen Nguyen Oscar Chaparro Antonio Mastropaolo

Code Language Models (CLMs) have demonstrated high effectiveness in automating software engineering tasks such as bug fixing, code generation, and documentation. This progress has been driven by the scaling of large models, ranging from millions to trillions parameters (e.g., GPT-4). However, models grow scale, sustainability concerns emerge, they are extremely resource-intensive, highlighting need for efficient, environmentally conscious solutions. GreenAI techniques, QLoRA (Quantized...

10.48550/arxiv.2502.03617 preprint EN arXiv (Cornell University) 2025-02-05

Combining Language and App UI Analysis for the Automated Assessment of Bug Reproduction Steps

OPENALEX - Publications

Junayed Mahmud Antu Saha Oscar Chaparro Kevin Moran Andrian Marcus

Bug reports are essential for developers to confirm software problems, investigate their causes, and validate fixes. Unfortunately, often miss important information or written unclearly, which can cause delays, increased issue resolution effort, even the inability solve issues. One of most common components that problematic is steps reproduce bug(s) (S2Rs), replicate described program failures reason about Given proclivity deficiencies in reported S2Rs, prior work has proposed techniques...

10.48550/arxiv.2502.04251 preprint EN arXiv (Cornell University) 2025-02-06

The ML Supply Chain in the Era of Software 2.0: Lessons Learned from Hugging Face

OPENALEX - Publications

Trevor Stalnaker Nathan Wintersgill Oscar Chaparro Laura A. Heymann Massimiliano Di Penta and 2 more

The last decade has seen widespread adoption of Machine Learning (ML) components in software systems. This occurred nearly every domain, from natural language processing to computer vision. These ML range relatively simple neural networks complex and resource-intensive large models. However, despite this adoption, little is known about the supply chain relationships that produce these models, which can have implications for compliance security. In work, we conduct an extensive analysis...

10.48550/arxiv.2502.04484 preprint EN arXiv (Cornell University) 2025-02-06

Toward interactive bug reporting for (android app) end-users

OPENALEX - Publications

Yang Song Junayed Mahmud Ying Zhou Oscar Chaparro Kevin Moran and 2 more

Many software bugs are reported manually, particularly that manifest themselves visually in the user interface. End-users typically report these via app reviewing websites, issue trackers, or in-app built-in bug reporting tools, if available. While systems have various features facilitate (e.g., textual templates forms), they often provide limited guidance, concrete feedback, quality verification to end-users, who inexperienced at and submit low-quality reports lead excessive developer...

10.1145/3540250.3549131 article EN Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering 2022-11-07

On Using GUI Interaction Data to Improve Text Retrieval-based Bug Localization

OPENALEX - Publications

Junayed Mahmud Nadeeshan De Silva Safwat Ali Khan Seyed Hooman Mostafavi S M Hasan Mansur and 3 more

One of the most important tasks related to managing bug reports is localizing fault so that a fix can be applied. As such, prior work has aimed automate this task localization by formulating it as an information retrieval problem, where potentially buggy files are retrieved and ranked according their textual similarity with given report. However, there often notable semantic gap between contained in identifiers or natural language within source code files. For user-facing software, currently...

10.1145/3597503.3608139 article EN 2024-02-06

Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug Reports

OPENALEX - Publications

Yanfu Yan Nathan Cooper Oscar Chaparro Kevin Moran Denys Poshyvanyk

Video-based bug reports are increasingly being used to document bugs for programs centered around a graphical user interface (GUI). However, developing automated techniques manage video-based is challenging as it requires identifying and understanding often nuanced visual patterns that capture key information about reported bug. In this paper, we aim overcome these challenges by advancing the report management task of duplicate detection reports. To end, introduce new approach, called Janus,...

10.1145/3597503.3639163 article EN cc-by 2024-04-12

Bridging the Quantum Divide: Aligning Academic and Industry Goals in Software Engineering

OPENALEX - Publications

Jake Zappin Trevor Stalnaker Oscar Chaparro Denys Poshyvanyk

This position paper examines the substantial divide between academia and industry within quantum software engineering. For example, while academic research related to debugging testing predominantly focuses on a limited subset of primarily quantum-specific issues, practitioners face broader range practical concerns, including integration, compatibility, real-world implementation hurdles. disconnect mainly arises due academia's access practices often confidential, competitive nature...

10.48550/arxiv.2502.07014 preprint EN arXiv (Cornell University) 2025-02-10

On the reduction of verbose queries in text retrieval based software maintenance

OPENALEX - Publications

Oscar Chaparro Andrian Marcus

We argue that verbose queries used for software retrieval contain many terms follow specific discourse rules, yet hinder retrieval. report the results of an empirical study on effect removing such from in context Text Retrieval-based concept location. In study, we remove 424 queries, generated bug reports nine open source systems. Removing leads to substantial improvement retrieval: 73% are improved, leading 21.8% and 13.4% gain MRR MAP, respectively. Such is larger than more sophisticated...

10.1145/2889160.2892647 article EN 2016-05-14

Towards the Automatic Extraction of Structural Business Rules from Legacy Databases

OPENALEX - Publications

Oscar Chaparro Jairo Aponte Fernando Ortega Andrian Marcus

One of the most important problems in evolution legacy systems is loss knowledge about them. In this paper, we present an approach for extracting structural business rules from databases. We used technique to recover SIFI (SIstema Fiduciario Integra do), existing system, implemented mostly PL/SQL and Oracle Forms. Four employees company that know system its domain evaluated extracted order assess precision extraction technique. The results show 29% recovered are correct rules, 36% correspond...

10.1109/wcre.2012.57 article EN 2012-10-01

Improving Bug Reporting, Duplicate Detection, and Localization

OPENALEX - Publications

Oscar Chaparro

Software developers rely on essential textual information from bug reports (such as Observed Behavior, Expected and Steps to Reproduce) triage fix software bugs. Unfortunately, while relevant useful, this is often missing, incomplete, superficial, ambiguous, or complex follow. Low-quality content in causes delay extra effort fixing. Current technology research are insufficient support users providing high-quality reports. Our intended fill gap, it aims at improving: (1) the quality of...

10.1109/icse-c.2017.27 article EN 2017-05-01

Translating Video Recordings of Complex Mobile App UI Gestures into Replayable Scenarios

OPENALEX - Publications

Carlos Bernal-Cárdenas Nathan Cooper Madeleine Havranek Kevin Moran Oscar Chaparro and 2 more

Screen recordings of mobile applications are easy to obtain and capture a wealth information pertinent software developers (e.g., bugs or feature requests), making them popular mechanism for crowdsourced app feedback. Thus, these videos becoming common artifact that must manage. In light unique development constraints, including swift release cycles rapidly evolving platforms, automated techniques analyzing all types rich artifacts provide benefit developers. Unfortunately, automatically...

10.1109/tse.2022.3192279 article EN IEEE Transactions on Software Engineering 2022-07-25

The NLBSE'23 Tool Competition

OPENALEX - Publications

Rafael Kallis Maliheh Izadi Luca Pascarella Oscar Chaparro Pooja Rani

We report on the organization and results of second edition tool competition from International Workshop Natural Language-based Software Engineering (NLBSE'23). As in prior edition, we organized automated issue classification, with a larger dataset. This year, featured an extra au-tomated code comment classification. In this five teams submitted multiple classification models to automatically classify reports comments. The were fine-tuned evaluated benchmark dataset 1.4 million or 6.7...

10.1109/nlbse59153.2023.00007 article EN 2023-05-01

On the Vocabulary Agreement in Software Issue Descriptions

OPENALEX - Publications

Oscar Chaparro Juan Manuel Florez Andrian Marcus

Many software comprehension tasks depend on how stakeholders textually describe their problems. These textual descriptions are leveraged by Text Retrieval (TR)-based solutions to more than 20 engineering tasks, such as duplicate issue detection. The common assumption of methods is that text describing the same in multiple places will have a vocabulary. This paper presents an empirical study aimed at verifying this and discusses impact vocabulary investigated 13K+ pairs bug reports Stack...

10.1109/icsme.2016.44 article EN 2016-10-01