NFDI4DS | UHH-SEMS - Publication Details

Sarah Nadi

ORCID: 0000-0002-0091-6030

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5054083284

Research Areas

Software Engineering Research
Software System Performance and Reliability
Advanced Software Engineering Methodologies
Advanced Malware Detection Techniques
Software Testing and Debugging Techniques
Software Engineering Techniques and Practices
Web Application Security Vulnerabilities
Software Reliability and Analysis Research
Service-Oriented Architecture and Web Services
Open Source Software Innovations
Topic Modeling
Computational Physics and Python Applications
Information and Cyber Security
Scientific Computing and Data Management
Distributed systems and fault tolerance
Parallel Computing and Optimization Techniques
Web Data Mining and Analysis
Digital and Cyber Forensics
Data Quality and Management
Mathematics, Computing, and Information Processing
Advanced Data Storage Technologies
Natural Language Processing Techniques
Security and Verification in Computing
Business Process Modeling and Analysis
Academic Publishing and Open Access

University of Alberta
2017-2024

New York University Abu Dhabi
2024

Faculty of Media
2019

University of Zurich
2019

Makerere University
2018

Technical University of Darmstadt
2015-2017

Paderborn University
2017

Software (Germany)
2016

Carnegie Mellon University
2015

University of Waterloo
2009-2014

Jumping through hoops

OPENALEX - Publications

Sarah Nadi Stefan Krüger Mira Mezini Eric Bodden

To protect sensitive data processed by current applications, developers, whether security experts or not, have to rely on cryptography. While cryptography algorithms become increasingly advanced, many breaches occur because developers do not correctly use the corresponding APIs. guide future research into practical solutions this problem, we perform an empirical investigation obstacles face while using Java APIs, tasks they APIs for, and kind of (tool) support desire. We triangulate from...

10.1145/2884781.2884790 article EN Proceedings of the 44th International Conference on Software Engineering 2016-05-13

An empirical evaluation of GitHub copilot's code suggestions

OPENALEX - Publications

Nhan Nguyen Sarah Nadi

GitHub and OpenAI recently launched Copilot, an "AI pair programmer" that utilizes the power of Natural Language Processing, Static Analysis, Code Synthesis, Artificial Intelligence. Given a natural language description target functionality, Copilot can generate corresponding code in several programming languages. In this paper, we perform empirical study to evaluate correctness understandability Copilot's suggested code. We use 33 LeetCode questions create queries for four different 132...

10.1145/3524842.3528470 article EN 2022-05-23

An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation

OPENALEX - Publications

Max Schäfer Sarah Nadi Aryaz Eghbali Frank Tip

Unit tests play a key role in ensuring the correctness of software. However, manually creating unit is laborious task, motivating need for automation. Large Language Models (LLMs) have recently been applied to various aspects software development, including their suggested use automated generation tests, but while requiring additional training or few-shot learning on examples existing tests. This paper presents large-scale empirical evaluation effectiveness LLMs test without manual effort....

10.1109/tse.2023.3334955 article EN IEEE Transactions on Software Engineering 2023-11-28

Mining configuration constraints: static analyses and empirical results

OPENALEX - Publications

Sarah Nadi Thorsten Berger Christian Kästner Krzysztof Czarnecki

Highly-configurable systems allow users to tailor the software their specific needs. Not all combinations of configuration options are valid though, and constraints arise for technical or non-technical reasons. Explicitly describing these in a variability model allows reasoning about supported configurations. To automate creating models, we need identify origin such constraints. We propose an approach which uses build-time errors novel feature-effect heuristic automatically extract from C...

10.1145/2568225.2568283 article EN Proceedings of the 44th International Conference on Software Engineering 2014-05-20

A Systematic Evaluation of Static API-Misuse Detectors

OPENALEX - Publications

Sven Amann Hoan Anh Nguyen Sarah Nadi Tien N. Nguyen Mira Mezini

Application Programming Interfaces (APIs) often have usage constraints, such as restrictions on call order or conditions. API misuses, i.e., violations of these may lead to software crashes, bugs, and vulnerabilities. Though researchers developed many API-misuse detectors over the last two decades, recent studies show that misuses are still prevalent. Therefore, we need understand capabilities limitations existing in advance state art. In this paper, present first-ever qualitative...

10.1109/tse.2018.2827384 article EN IEEE Transactions on Software Engineering 2018-04-16

CogniCrypt: Supporting developers in using cryptography

OPENALEX - Publications

Stefan Krüger Sarah Nadi Michael Reif Karim Ali Mira Mezini and 6 more

Previous research suggests that developers often struggle using low-level cryptographic APIs and, as a result, produce insecure code. When asked, desire, among other things, more tool support to help them use such APIs. In this paper, we present CogniCrypt, supports with the of CogniCrypt assists developer in two ways. First, for number common tasks, generates code implements respective task secure manner. Currently, tasks data encryption, communication over channels, and long-term...

10.1109/ase.2017.8115707 article EN 2017-10-01

Where Do Configuration Constraints Stem From? An Extraction Approach and an Empirical Study

OPENALEX - Publications

Sarah Nadi Thorsten Berger Christian Kästner Krzysztof Czarnecki

Highly configurable systems allow users to tailor software specific needs. Valid combinations of configuration options are often restricted by intricate constraints. Describing and constraints in a variability model allows reasoning about the supported configurations. To automate creating verifying such models, we need identify origin We propose static analysis approach, based on two rules, extract from code. apply it four highly evaluate accuracy our approach determine which recoverable...

10.1109/tse.2015.2415793 article EN IEEE Transactions on Software Engineering 2015-03-23

An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation

OPENALEX - Publications

Max Schäfer Sarah Nadi Aryaz Eghbali Frank Tip

Unit tests play a key role in ensuring the correctness of software. However, manually creating unit is laborious task, motivating need for automation. Large Language Models (LLMs) have recently been applied to this problem, utilizing additional training or few-shot learning on examples existing tests. This paper presents large-scale empirical evaluation effectiveness LLMs automated test generation without manual effort, providing LLM with signature and implementation function under test,...

10.48550/arxiv.2302.06527 preprint EN other-oa arXiv (Cornell University) 2023-01-01

MUBench

OPENALEX - Publications

Sven Amann Sarah Nadi Hoan Anh Nguyen Tien N. Nguyen Mira Mezini

Over the last few years, researchers proposed a multitude of automated bug-detection approaches that mine class bugs we call API misuses. Evaluations on variety software products show both omnipresence such misuses and ability to detect them.

10.1145/2901739.2903506 article EN 2016-05-14

A Study of Visual Studio Usage in Practice

OPENALEX - Publications

Sven Amann Sebastian Proksch Sarah Nadi Mira Mezini

Integrated Development Environments (IDEs) provide a convenient standalone solution that supports developers during various phases of software development. In order to better support for within such IDEs, we need understand how much time spend using parts given IDE and often they use available assistance tools. To infer useful conclusions, information should be gathered different types IDEs languages. this paper, instrument the previously unexplored Visual Studio track interactions at an...

10.1109/saner.2016.39 article EN 2016-03-01

The Love/Hate Relationship with the C Preprocessor: An Interview Study

OPENALEX - Publications

Flávio Medeiros Christian Kästner Márcio Ribeiro Sarah Nadi Rohit Gheyi

The C preprocessor has received strong criticism in academia, among others regarding separation of concerns, error proneness, and code obfuscation, but is widely used practice. Many (mostly academic) alternatives to the exist, have not been adopted Since developers continue use despite all research, we ask how practitioners perceive preprocessor. We performed interviews with 40 developers, grounded theory analyze data, cross-validated results data from a survey 202 repository mining,...

10.4230/lipics.ecoop.2015.495 article EN European Conference on Object-Oriented Programming 2015-07-01

The MSR Cookbook: Mining a decade of research

OPENALEX - Publications

Hadi Hemmati Sarah Nadi Olga Baysal Oleksii Kononenko Wei Wang and 2 more

The Mining Software Repositories (MSR) research community has grown significantly since the first MSR workshop was held in 2004. As continues to broaden its scope and deepens expertise, it is worthwhile reflect on best practices that our developed over past decade of research. We identify these by surveying conferences workshops. To end, we review all 117 full papers published proceedings between 2004 2012. extract 268 comments from papers, categorize them using a grounded theory...

10.1109/msr.2013.6624048 article EN 2013-05-01

Investigating Next Steps in Static API-Misuse Detection

OPENALEX - Publications

Amann Sven Hoan Anh Nguyen Sarah Nadi Tien N. Nguyen Mira Mezini

Application Programming Interfaces (APIs) often impose constraints such as call order or preconditions. API misuses, i.e., usages violating these constraints, may cause software crashes, data-loss, and vulnerabilities. Researchers developed several approaches to detect typically still resulting in low recall precision. In this work, we investigate ways improve API-misuse detection. We design MUDetect, an detector that builds on the strengths of existing detectors tries mitigate their...

10.1109/msr.2019.00053 article EN 2019-05-01

Mining Kbuild to Detect Variability Anomalies in Linux

OPENALEX - Publications

Sarah Nadi Ric Holt

The Linux kernel is extensively specialized or configured so that it can be used for many purposes. This variability implemented by means of three distinct artifacts: source code files, Kconfig (configuration) and Make files. Any inconsistencies between these lead to undesirable anomalies which increased maintenance efforts decreased reliability. paper extends published work had found (dead undead blocks) concentrating largely on We detect further in the when we also consider At level...

10.1109/csmr.2012.21 article EN 2012-03-01

Clone-Based Variability Management in the Android Ecosystem

OPENALEX - Publications

John Businge Moses Openja Sarah Nadi Engineer Bainomugisha Thorsten Berger

Mobile app developers often need to create variants account for different customer segments, payment models or functionalities. A common strategy is clone (or fork) an existing and then adapt it new requirements. This form of reuse has been enhanced with the advent social-coding platforms such as Github, cultivating a more systematic reuse. Different facilities, forks, pull requests, cross-project traceability support clone-based development. Unfortunately, even though, many apps are known...

10.1109/icsme.2018.00072 article EN 2018-09-01

Predicting Merge Conflicts in Collaborative Software Development

OPENALEX - Publications

Moein Owhadi-Kareshk Sarah Nadi Julia Rubin

Background. During collaborative software development, developers often use branches to add features or fix bugs. When merging changes from two branches, conflicts may occur if the are inconsistent. Developers need resolve these before completing merge, which is an error-prone and time-consuming process. Early detection of merge conflicts, warns about resolving they become large complicated, among ways dealing with this problem. Existing techniques do by continuously pulling all combinations...

10.1109/esem.2019.8870173 article EN 2019-09-01

Reuse and maintenance practices among divergent forks in three software ecosystems

OPENALEX - Publications

John Businge Moses Openja Sarah Nadi Thorsten Berger

Abstract With the rise of social coding platforms that rely on distributed version control systems, software reuse is also rise. Many developers leverage this by creating variants through forking, to account for different customer needs, markets, or environments. Forked then form a so-called family; they share common code base and are maintained in parallel same developers. As such, families can easily arise within ecosystems, which large collections interdependent components communities...

10.1007/s10664-021-10078-2 article EN cc-by Empirical Software Engineering 2022-03-01

Evaluating the Effectiveness of LLMs in Fixing Maintainability Issues in Real-World Projects

OPENALEX - Publications

Henrique Manuel Barreto Nunes Eduardo Figueiredo Larissa Rocha Sarah Nadi Fischer Ferreira and 1 more

Large Language Models (LLMs) have gained attention for addressing coding problems, but their effectiveness in fixing code maintainability remains unclear. This study evaluates LLMs capability to resolve 127 issues from 10 GitHub repositories. We use zero-shot prompting Copilot Chat and Llama 3.1, few-shot with only. The LLM-generated solutions are assessed compilation errors, test failures, new problems. successfully fixed 44.9% of the methods, while 32.29% 30%, respectively. However, most...

10.48550/arxiv.2502.02368 preprint EN arXiv (Cornell University) 2025-02-04

The Linux kernel: a case study of build system variability

OPENALEX - Publications

Sarah Nadi Ric Holt

SUMMARY Although build systems control what code gets compiled into the final built product, they are often overlooked when studying software variability. The Linux kernel is one of biggest open source supporting variability and contains over 10,000 configurable features described in its Kconfig files. To understand role system implementation, we use as a case study. We study system, Kbuild , extract constraints Makefiles. first provide quantitative analysis . then how affect anomalies...

10.1002/smr.1595 article EN Journal of Software Evolution and Process 2013-04-18

On the Positive Effect of Reactive Programming on Software Comprehension: An Empirical Study

OPENALEX - Publications

Guido Salvaneschi Sebastian Proksch Sven Amann Sarah Nadi Mira Mezini

Starting from the first investigations with strictly functional languages, reactive programming has been proposed as paradigm for applications. Over years, researchers have enriched languages more powerful abstractions, embedded these abstractions into mainstream languages-including object-oriented languages-and applied to several domains, such GUIs, animations, Web applications, robotics, and sensor networks. However, an important assumption behind this line of research is that, beside...

10.1109/tse.2017.2655524 article EN IEEE Transactions on Software Engineering 2017-01-19

Are Refactorings to Blame? An Empirical Study of Refactorings in Merge Conflicts

OPENALEX - Publications

Mehran Mahmoudi Sarah Nadi Nikolaos Tsantalis

With the rise of distributed software development, branching has become a popular approach that facilitates collaboration between developers. One biggest challenges developers face when using multiple development branches is dealing with merge conflicts. Conflicts occur inconsistent changes happen to code. Resolving these conflicts can be cumbersome task as it requires prior knowledge about in each branches. A type change could potentially lead complex code refactoring. Previous studies have...

10.1109/saner.2019.8668012 article EN 2019-02-01

A Large-scale Data Set and an Empirical Study of Docker Images Hosted on Docker Hub

OPENALEX - Publications

Changyuan Lin Sarah Nadi Hamzeh Khazaei

Docker is currently one of the most popular containerization solutions. Previous work investigated various characteristics ecosystem, but has mainly focused on Dockerfiles from GitHub, limiting type questions that can be asked, and did not investigate evolution aspects. In this paper, we create a recent more comprehensive data set by collecting Hub, Bitbucket. Our contains information about 3,364,529 images 378,615 git repositories behind them. Using set, conduct large-scale empirical study...

10.1109/icsme46990.2020.00043 article EN 2020-09-01

Towards secure integration of cryptographic software

OPENALEX - Publications

Steven Arzt Sarah Nadi Karim Ali Eric Bodden Sebastian Erdweg and 1 more

While cryptography is now readily available to everyone and can, provably, protect private information from attackers, we still frequently hear about major data leakages, many of which are due improper use cryptographic mechanisms. The problem that application developers not experts. Even though high-quality APIs widely available, programmers often select the wrong algorithms or misuse a lack understanding. Such issues arise with both simple operations such as encryption well complex secure...

10.1145/2814228.2814229 article EN 2015-10-21

Evaluating the evaluations of code recommender systems: a reality check

OPENALEX - Publications

Sebastian Proksch Sven Amann Sarah Nadi Mira Mezini

While researchers develop many new exciting code recommender systems, such as method-call completion, code-snippet or search, an accurate evaluation of systems is always a challenge. We analyzed the current literature and found that most evaluations rely on artificial queries extracted from released code, which begs question: Do reflect real-life usages? To answer this question, we capture 6,189 fine-grained development histories real IDE interactions. use them ground truth extract 7,157 for...

10.1145/2970276.2970330 article EN 2016-08-25

Enriched event streams

OPENALEX - Publications

Sebastian Proksch Sven Amann Sarah Nadi

Developers have been the subject of many empirical studies over years. To assist developers in their everyday work, an understanding activities is necessary, especially how they develop source code. Unfortunately, conducting such very expensive and researchers often resort to studying artifacts after fact. pave road for future on developer activities, we built FeedBaG, a general-purpose interaction tracker Visual Studio that monitors development activities. The observations are stored...

10.1145/3196398.3196400 article EN 2018-05-28

Coming Soon ...