Oscar Chaparro

ORCID: 0000-0003-2838-685X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Software Engineering Research
  • Advanced Malware Detection Techniques
  • Software Testing and Debugging Techniques
  • Web Data Mining and Analysis
  • Astrophysics and Cosmic Phenomena
  • Software Engineering Techniques and Practices
  • Topic Modeling
  • Software Reliability and Analysis Research
  • Open Source Software Innovations
  • Dark Matter and Cosmic Phenomena
  • Neutrino Physics Research
  • Law, AI, and Intellectual Property
  • Copyright and Intellectual Property
  • Computational Physics and Python Applications
  • Software System Performance and Reliability
  • Natural Language Processing Techniques
  • Educational Innovations and Technology
  • Knowledge Societies in the 21st Century
  • Mathematics, Computing, and Information Processing
  • Business Law and Ethics
  • Particle Detector Development and Performance
  • Advanced Database Systems and Queries
  • Information and Cyber Security
  • Mobile and Web Applications
  • Web Application Security Vulnerabilities

William & Mary
2019-2024

Williams (United States)
2021-2024

Instituto Politécnico Nacional
2023

Delft University of Technology
2023

ETH Zurich
2023

University of Zurich
2023

The University of Texas at Dallas
2014-2019

Bug reports document unexpected software behaviors experienced by users. To be effective, they should allow bug triagers to easily understand and reproduce the potential reported bugs, clearly describing Observed Behavior (OB), Steps Reproduce (S2R), Expected (EB). Unfortunately, while considered extremely useful, reporters often miss such pieces of information in and, date, there is no effective way automatically check enforce their presence. We manually analyzed nearly 3k what extent OB,...

10.1145/3106237.3106285 article EN 2017-08-02

We advocate for a paradigm shift in supporting the information needs of developers, centered around concept automated on-demand developer documentation. Currently, are fulfilled by asking experts or consulting Unfortunately, traditional documentation practices inefficient because of, among others, manual nature its creation and gap between creators consumers. discuss major challenges we face realizing such shift, highlight existing research that can be leveraged to this end, promote...

10.1109/icsme.2017.17 article EN 2017-09-01

Software Bills of Materials (SBOMs) have emerged as tools to facilitate the management software dependencies, vulnerabilities, licenses, and supply chain. While significant effort has been devoted increasing SBOM awareness developing formats tools, recent studies shown that SBOMs are still an early technology not yet adequately adopted in practice. Expanding on previous research, this paper reports a comprehensive study investigates current challenges stakeholders encounter when creating...

10.1145/3597503.3623347 preprint EN cc-by 2024-02-06

Screen recordings of mobile applications are easy to obtain and capture a wealth information pertinent software developers (e.g., bugs or feature requests), making them popular mechanism for crowdsourced app feedback. Thus, these videos becoming common artifact that must manage. In light unique development constraints, including swift release cycles rapidly evolving platforms, automated techniques analyzing all types rich artifacts provide benefit developers. Unfortunately, automatically...

10.1145/3377811.3380328 preprint EN 2020-06-27

Refactorings are behavior-preserving source code transformations. While tool support exists for (semi) automatically identifying refactoring solutions, applying or not a recommended is usually up to the software developers, who have assess impact that transformation will on their system. Evaluating pros (e.g., bad smell removal) and cons side effects of change) far from trivial. We present RIPE (Refactoring Impact Prediction), technique estimates operations quality metrics. supports 12 11...

10.1109/icsme.2014.73 article EN 2014-09-01

A major problem with user-written bug reports, indicated by developers and documented researchers, is the (lack of high) quality reported steps to reproduce bugs. Low-quality lead excessive manual effort spent on triage resolution. This paper proposes Euler, an approach that automatically identifies assesses in a report, providing feedback reporters, which they can use improve report. The provided Euler was assessed external evaluators results indicate correctly identified 98% existing 58%...

10.1145/3338906.3338947 article EN 2019-08-09

When a bug manifests in user-facing application, it is likely to be exposed through the graphical user interface (GUI). Given importance of visual information process identifying and understanding such bugs, users are increasingly making use screenshots screen-recordings as means report issues developers. However, when reported en masse, during crowd-sourced testing, managing these artifacts can time-consuming process. As reporting particular becomes more popular, developers face challenges...

10.1109/icse43902.2021.00091 article EN 2021-05-01

Text Retrieval (TR)-based approaches for bug localization rely on formulating an initial query based a report. Often, the does not return buggy software artifacts at or near top of list (i.e., it is low-quality query). In such cases, needs reformulation. Existing research supporting developers in reformulation queries focuses mostly leveraging relevance feedback from user expanding original with additional information (e.g., adding synonyms). many problem lowquality presence irrelevant terms...

10.1109/icsme.2017.100 article EN 2017-09-01

When bugs are reported, one important task is to check if they new or were reported before. Many approaches have been proposed partially automate duplicate bug report detection, and most of them rely on text retrieval techniques, using the reports as queries. Some include additional information use complex retrieval- learning-based methods. In end, even sophisticated fail retrieve in many cases, leaving triagers their own devices. We argue that these tools should be used interactively,...

10.1109/saner.2019.8667985 article EN 2019-02-01

This paper introduces BEE, a tool that automatically analyzes user-written bug reports and provides feedback to reporters developers about the system's observed behavior (OB), expected (EB), steps reproduce (S2R). BEE employs machine learning (i) detect if an issue describes bug, enhancement, or question; (ii) identify structure of descriptions by labeling sentences correspond OB, EB, S2R; (iii) when fail provide these elements. is integrated with GitHub offers public web API researchers can...

10.1145/3368089.3417928 article EN 2020-11-08

We report on the organization and results of first edition Tool Competition from International Workshop Natural Language-based Software Engineering (NLBSE'22). This year, five teams submitted multiple classification models to automatically classify issue reports as bugs, enhancements, or questions. Most them are based BERT (Bidirectional Encoder Representations Transformers) were fine-tuned evaluated a benchmark dataset 800k reports. The goal competition was improve performance baseline...

10.1145/3528588.3528664 article EN 2022-05-21

Code Language Models (CLMs) have demonstrated high effectiveness in automating software engineering tasks such as bug fixing, code generation, and documentation. This progress has been driven by the scaling of large models, ranging from millions to trillions parameters (e.g., GPT-4). However, models grow scale, sustainability concerns emerge, they are extremely resource-intensive, highlighting need for efficient, environmentally conscious solutions. GreenAI techniques, QLoRA (Quantized...

10.48550/arxiv.2502.03617 preprint EN arXiv (Cornell University) 2025-02-05

Bug reports are essential for developers to confirm software problems, investigate their causes, and validate fixes. Unfortunately, often miss important information or written unclearly, which can cause delays, increased issue resolution effort, even the inability solve issues. One of most common components that problematic is steps reproduce bug(s) (S2Rs), replicate described program failures reason about Given proclivity deficiencies in reported S2Rs, prior work has proposed techniques...

10.48550/arxiv.2502.04251 preprint EN arXiv (Cornell University) 2025-02-06

The last decade has seen widespread adoption of Machine Learning (ML) components in software systems. This occurred nearly every domain, from natural language processing to computer vision. These ML range relatively simple neural networks complex and resource-intensive large models. However, despite this adoption, little is known about the supply chain relationships that produce these models, which can have implications for compliance security. In work, we conduct an extensive analysis...

10.48550/arxiv.2502.04484 preprint EN arXiv (Cornell University) 2025-02-06

Many software bugs are reported manually, particularly that manifest themselves visually in the user interface. End-users typically report these via app reviewing websites, issue trackers, or in-app built-in bug reporting tools, if available. While systems have various features facilitate (e.g., textual templates forms), they often provide limited guidance, concrete feedback, quality verification to end-users, who inexperienced at and submit low-quality reports lead excessive developer...

10.1145/3540250.3549131 article EN Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering 2022-11-07

One of the most important tasks related to managing bug reports is localizing fault so that a fix can be applied. As such, prior work has aimed automate this task localization by formulating it as an information retrieval problem, where potentially buggy files are retrieved and ranked according their textual similarity with given report. However, there often notable semantic gap between contained in identifiers or natural language within source code files. For user-facing software, currently...

10.1145/3597503.3608139 article EN 2024-02-06

Video-based bug reports are increasingly being used to document bugs for programs centered around a graphical user interface (GUI). However, developing automated techniques manage video-based is challenging as it requires identifying and understanding often nuanced visual patterns that capture key information about reported bug. In this paper, we aim overcome these challenges by advancing the report management task of duplicate detection reports. To end, introduce new approach, called Janus,...

10.1145/3597503.3639163 article EN cc-by 2024-04-12

This position paper examines the substantial divide between academia and industry within quantum software engineering. For example, while academic research related to debugging testing predominantly focuses on a limited subset of primarily quantum-specific issues, practitioners face broader range practical concerns, including integration, compatibility, real-world implementation hurdles. disconnect mainly arises due academia's access practices often confidential, competitive nature...

10.48550/arxiv.2502.07014 preprint EN arXiv (Cornell University) 2025-02-10

We argue that verbose queries used for software retrieval contain many terms follow specific discourse rules, yet hinder retrieval. report the results of an empirical study on effect removing such from in context Text Retrieval-based concept location. In study, we remove 424 queries, generated bug reports nine open source systems. Removing leads to substantial improvement retrieval: 73% are improved, leading 21.8% and 13.4% gain MRR MAP, respectively. Such is larger than more sophisticated...

10.1145/2889160.2892647 article EN 2016-05-14

One of the most important problems in evolution legacy systems is loss knowledge about them. In this paper, we present an approach for extracting structural business rules from databases. We used technique to recover SIFI (SIstema Fiduciario Integra do), existing system, implemented mostly PL/SQL and Oracle Forms. Four employees company that know system its domain evaluated extracted order assess precision extraction technique. The results show 29% recovered are correct rules, 36% correspond...

10.1109/wcre.2012.57 article EN 2012-10-01

Software developers rely on essential textual information from bug reports (such as Observed Behavior, Expected and Steps to Reproduce) triage fix software bugs. Unfortunately, while relevant useful, this is often missing, incomplete, superficial, ambiguous, or complex follow. Low-quality content in causes delay extra effort fixing. Current technology research are insufficient support users providing high-quality reports. Our intended fill gap, it aims at improving: (1) the quality of...

10.1109/icse-c.2017.27 article EN 2017-05-01

Screen recordings of mobile applications are easy to obtain and capture a wealth information pertinent software developers (e.g., bugs or feature requests), making them popular mechanism for crowdsourced app feedback. Thus, these videos becoming common artifact that must manage. In light unique development constraints, including swift release cycles rapidly evolving platforms, automated techniques analyzing all types rich artifacts provide benefit developers. Unfortunately, automatically...

10.1109/tse.2022.3192279 article EN IEEE Transactions on Software Engineering 2022-07-25

We report on the organization and results of second edition tool competition from International Workshop Natural Language-based Software Engineering (NLBSE'23). As in prior edition, we organized automated issue classification, with a larger dataset. This year, featured an extra au-tomated code comment classification. In this five teams submitted multiple classification models to automatically classify reports comments. The were fine-tuned evaluated benchmark dataset 1.4 million or 6.7...

10.1109/nlbse59153.2023.00007 article EN 2023-05-01

Many software comprehension tasks depend on how stakeholders textually describe their problems. These textual descriptions are leveraged by Text Retrieval (TR)-based solutions to more than 20 engineering tasks, such as duplicate issue detection. The common assumption of methods is that text describing the same in multiple places will have a vocabulary. This paper presents an empirical study aimed at verifying this and discusses impact vocabulary investigated 13K+ pairs bug reports Stack...

10.1109/icsme.2016.44 article EN 2016-10-01
Coming Soon ...