- Software Engineering Research
- Software Testing and Debugging Techniques
- Software Reliability and Analysis Research
- Software Engineering Techniques and Practices
- Topic Modeling
- Software System Performance and Reliability
- Machine Learning and Data Classification
- Advanced Malware Detection Techniques
- Machine Learning and Algorithms
- Natural Language Processing Techniques
- Parallel Computing and Optimization Techniques
Free University of Bozen-Bolzano
2021-2024
OpenAI's Codex, a GPT-3 like model trained on large code corpus, has made headlines in and outside of academia. Given short user-provided description, it is capable synthesizing snippets that are syntactically semantically valid most cases. In this work, we want to investigate whether Codex able localize fix bugs, two important tasks automated program repair. Our initial evaluation uses the multi-language QuixBugs benchmark (40 bugs both Python Java). We find that, despite not being for APR,...
This paper provides a starting point for Software Engineering (SE) researchers and practitioners faced with the problem of training machine learning models on small datasets. Due to high costs associated labeling data, in Engineering, there exist many (< 5,000 samples) medium-sized (<100,000 While deep has set state art tasks, it is only recently that proven effective small-sized datasets, primarily thanks pre-training, semi-supervised technique leverages abundant unlabelled data alongside...
OpenAI's Codex, a GPT-3 like model trained on large code corpus, has made headlines in and outside of academia. Given short user-provided description, it is capable synthesizing snippets that are syntactically semantically valid most cases. In this work, we want to investigate whether Codex able localize fix bugs, task central interest the field automated program repair. Our initial evaluation uses multi-language QuixBugs benchmark (40 bugs both Python Java). We find that, despite not being...
Deep learning source code models have been applied very successfully to the problem of automated program repair. One standing issues is small input window current which often cannot fully fit context required for a bug fix (e.g., method or class declarations project). Instead, restricted local context, that is, lines below and above location. In this work we study importance on repair success: how much needed?; before after location more important? tied type? To answer these questions train...
The Codex model has demonstrated extraordinary competence in synthesizing code from natural language problem descriptions. However, order to reveal unknown failure modes and hidden biases, such large-scale models must be systematically subjected multiple diverse evaluation studies. In this work, we evaluate the synthesis capabilities of based on a set 115 Python statements popular competitive programming portal: HackerRank. Our shows that is indeed proficient Python, solving 96% problems...
Recently, we can notice a transition to data-driven techniques in Automated Program Repair (APR), particular towards deep neural networks. This entails training on hundreds of thousands or even millions non-executable code fragments. We would like bring more attention an aspect often neglected Neural (NPR), namely its execution. Code execution has several significant advantages. It allows for test-based evaluation candidate fixes and provide valuable information aid repair. In this work...
Many software engineering studies or tasks rely on categorizing artifacts. In practice, this is done either by defining simple but often imprecise heuristics, manual labelling of the Unfortunately, errors in these categorizations impact that them. To improve precision categorizations, we propose to gather heuristics a collaborative heuristic repository, which researchers can contribute large amount diverse for variety SE These are then leveraged state-of-the-art weak supervision techniques...
This paper provides a starting point for Software Engineering (SE) researchers and practitioners faced with the problem of training machine learning models on small datasets. Due to high costs associated labeling data, in Engineering,there exist many (< 1 000 samples) medium-sized 100 While deep has set state art tasks, it is only recently that proven effective small-sized datasets, primarily thanks pre-training, semi-supervised technique leverages abundant unlabelled data alongside scarce...
Deep learning source code models have been applied very successfully to the problem of automated program repair. One standing issues is small input window current which often cannot fully fit context required for a bug fix (e.g., method or class declarations project). Instead, restricted local context, that is, lines below and above location. In this work we study importance on repair success: how much needed?; before after location more important? tied type? To answer these questions train...
Many software engineering studies or tasks rely on categorizing artifacts. In practice, this is done either by defining simple but often imprecise heuristics, manual labelling of the Unfortunately, errors in these categorizations impact that them. To improve precision categorizations, we propose to gather heuristics a collaborative heuristic repository, which researchers can contribute large amount diverse for variety SE These are then leveraged state-of-the-art weak supervision techniques...