Sarah Nadi

ORCID: 0000-0002-0091-6030
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Software Engineering Research
  • Software System Performance and Reliability
  • Advanced Software Engineering Methodologies
  • Advanced Malware Detection Techniques
  • Software Testing and Debugging Techniques
  • Software Engineering Techniques and Practices
  • Web Application Security Vulnerabilities
  • Software Reliability and Analysis Research
  • Service-Oriented Architecture and Web Services
  • Open Source Software Innovations
  • Topic Modeling
  • Computational Physics and Python Applications
  • Information and Cyber Security
  • Scientific Computing and Data Management
  • Distributed systems and fault tolerance
  • Parallel Computing and Optimization Techniques
  • Web Data Mining and Analysis
  • Digital and Cyber Forensics
  • Data Quality and Management
  • Mathematics, Computing, and Information Processing
  • Advanced Data Storage Technologies
  • Natural Language Processing Techniques
  • Security and Verification in Computing
  • Business Process Modeling and Analysis
  • Academic Publishing and Open Access

University of Alberta
2017-2024

New York University Abu Dhabi
2024

Faculty of Media
2019

University of Zurich
2019

Makerere University
2018

Technical University of Darmstadt
2015-2017

Paderborn University
2017

Software (Germany)
2016

Carnegie Mellon University
2015

University of Waterloo
2009-2014

To protect sensitive data processed by current applications, developers, whether security experts or not, have to rely on cryptography. While cryptography algorithms become increasingly advanced, many breaches occur because developers do not correctly use the corresponding APIs. guide future research into practical solutions this problem, we perform an empirical investigation obstacles face while using Java APIs, tasks they APIs for, and kind of (tool) support desire. We triangulate from...

10.1145/2884781.2884790 article EN Proceedings of the 44th International Conference on Software Engineering 2016-05-13

GitHub and OpenAI recently launched Copilot, an "AI pair programmer" that utilizes the power of Natural Language Processing, Static Analysis, Code Synthesis, Artificial Intelligence. Given a natural language description target functionality, Copilot can generate corresponding code in several programming languages. In this paper, we perform empirical study to evaluate correctness understandability Copilot's suggested code. We use 33 LeetCode questions create queries for four different 132...

10.1145/3524842.3528470 article EN 2022-05-23

Unit tests play a key role in ensuring the correctness of software. However, manually creating unit is laborious task, motivating need for automation. Large Language Models (LLMs) have recently been applied to various aspects software development, including their suggested use automated generation tests, but while requiring additional training or few-shot learning on examples existing tests. This paper presents large-scale empirical evaluation effectiveness LLMs test without manual effort....

10.1109/tse.2023.3334955 article EN IEEE Transactions on Software Engineering 2023-11-28

Highly-configurable systems allow users to tailor the software their specific needs. Not all combinations of configuration options are valid though, and constraints arise for technical or non-technical reasons. Explicitly describing these in a variability model allows reasoning about supported configurations. To automate creating models, we need identify origin such constraints. We propose an approach which uses build-time errors novel feature-effect heuristic automatically extract from C...

10.1145/2568225.2568283 article EN Proceedings of the 44th International Conference on Software Engineering 2014-05-20

Application Programming Interfaces (APIs) often have usage constraints, such as restrictions on call order or conditions. API misuses, i.e., violations of these may lead to software crashes, bugs, and vulnerabilities. Though researchers developed many API-misuse detectors over the last two decades, recent studies show that misuses are still prevalent. Therefore, we need understand capabilities limitations existing in advance state art. In this paper, present first-ever qualitative...

10.1109/tse.2018.2827384 article EN IEEE Transactions on Software Engineering 2018-04-16

Previous research suggests that developers often struggle using low-level cryptographic APIs and, as a result, produce insecure code. When asked, desire, among other things, more tool support to help them use such APIs. In this paper, we present CogniCrypt, supports with the of CogniCrypt assists developer in two ways. First, for number common tasks, generates code implements respective task secure manner. Currently, tasks data encryption, communication over channels, and long-term...

10.1109/ase.2017.8115707 article EN 2017-10-01

Highly configurable systems allow users to tailor software specific needs. Valid combinations of configuration options are often restricted by intricate constraints. Describing and constraints in a variability model allows reasoning about the supported configurations. To automate creating verifying such models, we need identify origin We propose static analysis approach, based on two rules, extract from code. apply it four highly evaluate accuracy our approach determine which recoverable...

10.1109/tse.2015.2415793 article EN IEEE Transactions on Software Engineering 2015-03-23

Unit tests play a key role in ensuring the correctness of software. However, manually creating unit is laborious task, motivating need for automation. Large Language Models (LLMs) have recently been applied to this problem, utilizing additional training or few-shot learning on examples existing tests. This paper presents large-scale empirical evaluation effectiveness LLMs automated test generation without manual effort, providing LLM with signature and implementation function under test,...

10.48550/arxiv.2302.06527 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Over the last few years, researchers proposed a multitude of automated bug-detection approaches that mine class bugs we call API misuses. Evaluations on variety software products show both omnipresence such misuses and ability to detect them.

10.1145/2901739.2903506 article EN 2016-05-14

Integrated Development Environments (IDEs) provide a convenient standalone solution that supports developers during various phases of software development. In order to better support for within such IDEs, we need understand how much time spend using parts given IDE and often they use available assistance tools. To infer useful conclusions, information should be gathered different types IDEs languages. this paper, instrument the previously unexplored Visual Studio track interactions at an...

10.1109/saner.2016.39 article EN 2016-03-01

The C preprocessor has received strong criticism in academia, among others regarding separation of concerns, error proneness, and code obfuscation, but is widely used practice. Many (mostly academic) alternatives to the exist, have not been adopted Since developers continue use despite all research, we ask how practitioners perceive preprocessor. We performed interviews with 40 developers, grounded theory analyze data, cross-validated results data from a survey 202 repository mining,...

10.4230/lipics.ecoop.2015.495 article EN European Conference on Object-Oriented Programming 2015-07-01

The Mining Software Repositories (MSR) research community has grown significantly since the first MSR workshop was held in 2004. As continues to broaden its scope and deepens expertise, it is worthwhile reflect on best practices that our developed over past decade of research. We identify these by surveying conferences workshops. To end, we review all 117 full papers published proceedings between 2004 2012. extract 268 comments from papers, categorize them using a grounded theory...

10.1109/msr.2013.6624048 article EN 2013-05-01

Application Programming Interfaces (APIs) often impose constraints such as call order or preconditions. API misuses, i.e., usages violating these constraints, may cause software crashes, data-loss, and vulnerabilities. Researchers developed several approaches to detect typically still resulting in low recall precision. In this work, we investigate ways improve API-misuse detection. We design MUDetect, an detector that builds on the strengths of existing detectors tries mitigate their...

10.1109/msr.2019.00053 article EN 2019-05-01

The Linux kernel is extensively specialized or configured so that it can be used for many purposes. This variability implemented by means of three distinct artifacts: source code files, Kconfig (configuration) and Make files. Any inconsistencies between these lead to undesirable anomalies which increased maintenance efforts decreased reliability. paper extends published work had found (dead undead blocks) concentrating largely on We detect further in the when we also consider At level...

10.1109/csmr.2012.21 article EN 2012-03-01

Mobile app developers often need to create variants account for different customer segments, payment models or functionalities. A common strategy is clone (or fork) an existing and then adapt it new requirements. This form of reuse has been enhanced with the advent social-coding platforms such as Github, cultivating a more systematic reuse. Different facilities, forks, pull requests, cross-project traceability support clone-based development. Unfortunately, even though, many apps are known...

10.1109/icsme.2018.00072 article EN 2018-09-01

Background. During collaborative software development, developers often use branches to add features or fix bugs. When merging changes from two branches, conflicts may occur if the are inconsistent. Developers need resolve these before completing merge, which is an error-prone and time-consuming process. Early detection of merge conflicts, warns about resolving they become large complicated, among ways dealing with this problem. Existing techniques do by continuously pulling all combinations...

10.1109/esem.2019.8870173 article EN 2019-09-01

Abstract With the rise of social coding platforms that rely on distributed version control systems, software reuse is also rise. Many developers leverage this by creating variants through forking, to account for different customer needs, markets, or environments. Forked then form a so-called family; they share common code base and are maintained in parallel same developers. As such, families can easily arise within ecosystems, which large collections interdependent components communities...

10.1007/s10664-021-10078-2 article EN cc-by Empirical Software Engineering 2022-03-01

Large Language Models (LLMs) have gained attention for addressing coding problems, but their effectiveness in fixing code maintainability remains unclear. This study evaluates LLMs capability to resolve 127 issues from 10 GitHub repositories. We use zero-shot prompting Copilot Chat and Llama 3.1, few-shot with only. The LLM-generated solutions are assessed compilation errors, test failures, new problems. successfully fixed 44.9% of the methods, while 32.29% 30%, respectively. However, most...

10.48550/arxiv.2502.02368 preprint EN arXiv (Cornell University) 2025-02-04

SUMMARY Although build systems control what code gets compiled into the final built product, they are often overlooked when studying software variability. The Linux kernel is one of biggest open source supporting variability and contains over 10,000 configurable features described in its Kconfig files. To understand role system implementation, we use as a case study. We study system, Kbuild , extract constraints Makefiles. first provide quantitative analysis . then how affect anomalies...

10.1002/smr.1595 article EN Journal of Software Evolution and Process 2013-04-18

Starting from the first investigations with strictly functional languages, reactive programming has been proposed as paradigm for applications. Over years, researchers have enriched languages more powerful abstractions, embedded these abstractions into mainstream languages-including object-oriented languages-and applied to several domains, such GUIs, animations, Web applications, robotics, and sensor networks. However, an important assumption behind this line of research is that, beside...

10.1109/tse.2017.2655524 article EN IEEE Transactions on Software Engineering 2017-01-19

With the rise of distributed software development, branching has become a popular approach that facilitates collaboration between developers. One biggest challenges developers face when using multiple development branches is dealing with merge conflicts. Conflicts occur inconsistent changes happen to code. Resolving these conflicts can be cumbersome task as it requires prior knowledge about in each branches. A type change could potentially lead complex code refactoring. Previous studies have...

10.1109/saner.2019.8668012 article EN 2019-02-01

Docker is currently one of the most popular containerization solutions. Previous work investigated various characteristics ecosystem, but has mainly focused on Dockerfiles from GitHub, limiting type questions that can be asked, and did not investigate evolution aspects. In this paper, we create a recent more comprehensive data set by collecting Hub, Bitbucket. Our contains information about 3,364,529 images 378,615 git repositories behind them. Using set, conduct large-scale empirical study...

10.1109/icsme46990.2020.00043 article EN 2020-09-01

While cryptography is now readily available to everyone and can, provably, protect private information from attackers, we still frequently hear about major data leakages, many of which are due improper use cryptographic mechanisms. The problem that application developers not experts. Even though high-quality APIs widely available, programmers often select the wrong algorithms or misuse a lack understanding. Such issues arise with both simple operations such as encryption well complex secure...

10.1145/2814228.2814229 article EN 2015-10-21

While researchers develop many new exciting code recommender systems, such as method-call completion, code-snippet or search, an accurate evaluation of systems is always a challenge. We analyzed the current literature and found that most evaluations rely on artificial queries extracted from released code, which begs question: Do reflect real-life usages? To answer this question, we capture 6,189 fine-grained development histories real IDE interactions. use them ground truth extract 7,157 for...

10.1145/2970276.2970330 article EN 2016-08-25

Developers have been the subject of many empirical studies over years. To assist developers in their everyday work, an understanding activities is necessary, especially how they develop source code. Unfortunately, conducting such very expensive and researchers often resort to studying artifacts after fact. pave road for future on developer activities, we built FeedBaG, a general-purpose interaction tracker Visual Studio that monitors development activities. The observations are stored...

10.1145/3196398.3196400 article EN 2018-05-28
Coming Soon ...