Luca Buratti

ORCID: 0009-0007-1468-9995
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Software Engineering Research
  • Advanced Malware Detection Techniques
  • Software Reliability and Analysis Research
  • Software System Performance and Reliability
  • Software Testing and Debugging Techniques
  • Topic Modeling
  • Scientific Computing and Data Management
  • Advanced Text Analysis Techniques
  • Computational Physics and Python Applications
  • Service-Oriented Architecture and Web Services
  • Advanced Software Engineering Methodologies
  • Advanced Neural Network Applications
  • Forest Insect Ecology and Management
  • Insect-Plant Interactions and Control
  • Evolutionary Algorithms and Applications
  • Machine Learning and Algorithms
  • Machine Learning and ELM
  • Research on scale insects
  • Business Process Modeling and Analysis
  • Stochastic Gradient Optimization Techniques
  • Web Application Security Vulnerabilities
  • Domain Adaptation and Few-Shot Learning
  • Machine Learning and Data Classification
  • Software Engineering Techniques and Practices

IBM Research - Zurich
2024

IBM Research - Thomas J. Watson Research Center
2024

IBM (United States)
2021

Static analysis tools are widely used for vulnerability detection as they understand programs with complex behavior and millions of lines code. Despite their popularity, static known to generate an excess false positives. The recent ability Machine Learning models programming languages opens new possibilities when applied analysis. However, existing datasets train identification suffer from multiple limitations such limited bug context, size, synthetic unrealistic source We propose D2A, a...

10.1109/icse-seip52600.2021.00020 article EN 2021-05-01

Over the last several decades, software has been woven into fabric of every aspect our society. As development surges and code infrastructure enterprise applications ages, it is now more critical than ever to increase productivity modernize legacy applications. Advances in deep learning machine algorithms have enabled numerous breakthroughs, motivating researchers leverage AI techniques improve efficiency. Thus, fast-emerging research area for Code garnered new interest gathered momentum. In...

10.48550/arxiv.2105.12655 preprint EN other-oa arXiv (Cornell University) 2021-01-01

The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing. We explore this use of a pre-trained transformer-based model to perform code analysis tasks. Present approaches depend heavily on features derived from Abstract Syntax Tree (AST) while our models work raw source code. This is first investigate whether such discover AST automatically. To achieve this, we introduce sequence labeling task...

10.48550/arxiv.2006.12641 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Yangruibo Ding, Luca Buratti, Saurabh Pujar, Alessandro Morari, Baishakhi Ray, Saikat Chakraborty. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.436 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Abstract Static analysis tools are widely used for vulnerability detection as they can analyze programs with complex behavior and millions of lines code. Despite their popularity, static known to generate an excess false positives. The recent ability Machine Learning models learn from programming language data opens new possibilities reducing positives when applied analysis. However, existing datasets train identification suffer multiple limitations such limited bug context, size, synthetic...

10.1007/s10664-023-10405-9 article EN cc-by Empirical Software Engineering 2024-02-22

The recent improvement in code generation capabilities due to the use of large language models has mainly benefited general purpose programming languages. Domain specific languages, such as ones used for IT Automation, received far less attention, despite involving many active developers and being an essential component modern cloud platforms. This work focuses on Ansible YAML, a widely markup Automation. We present Wisdom, natural-language YAML tool, aimed at improving automation...

10.1109/dac56929.2023.10247987 article EN 2023-07-09

Code Large Language Models (Code LLMs) have emerged as powerful tools, revolutionizing the software development landscape by automating coding process and reducing time effort required to build applications. This paper focuses on training LLMs specialize in field of quantum computing. We begin discussing unique needs computing programming, which differ significantly from classical programming approaches or languages. A LLM specializing requires a foundational understanding information...

10.48550/arxiv.2405.19495 preprint EN arXiv (Cornell University) 2024-05-29

Deep Learning (DL) models to analyze source code have shown immense promise during the past few years. More recently, self-supervised pre-training has gained traction for learning generic representations valuable many downstream SE tasks, such as clone and bug detection.

10.1145/3597926.3598035 article EN 2023-07-12

Adam is a popular stochastic optimizer that uses adaptive estimates of lower-order moments to update weights and requires little hyper-parameter tuning. Some recent studies have called the generalization out-of-sample behavior such gradient methods into question, argued are only marginal value. Notably for many well-known image classification tasks as CIFAR-10 ImageNet-1K, current models with best validation performance still trained SGD manual schedule learning rate reduction. We analyze 7...

10.1109/mlhpc.2018.8638641 article EN 2018-11-01

Quantum programs are typically developed using quantum Software Development Kits (SDKs). The rapid advancement of computing necessitates new tools to streamline this development process, and one such tool could be Generative Artificial intelligence (GenAI). In study, we introduce use the Qiskit HumanEval dataset, a hand-curated collection tasks designed benchmark ability Large Language Models (LLMs) produce code - SDK. This dataset consists more than 100 tasks, each accompanied by prompt,...

10.48550/arxiv.2406.14712 preprint EN arXiv (Cornell University) 2024-06-20

The availability of Large Language Models (LLMs) which can generate code, has made it possible to create tools that improve developer productivity. Integrated development environments or IDEs developers use write software are often used as an interface interact with LLMs. Although many such have been released, almost all them focus on general-purpose programming languages. Domain-specific languages, those crucial for IT automation, not received much attention. Ansible is one YAML-based...

10.48550/arxiv.2402.17442 preprint EN arXiv (Cornell University) 2024-02-27

The complexity and scale of modern software programs often lead to overlooked programming errors security vulnerabilities. Developers rely on automatic tools, like static analysis look for bugs Static tools are widely used because they can understand nontrivial program behaviors, millions lines code, detect subtle bugs. However, known generate an excess false alarms which hinder their utilization as it is counterproductive developers go through a long list reported issues, only find few true...

10.1145/3524842.3528516 article EN 2022-05-23

Static analysis tools are widely used for vulnerability detection as they understand programs with complex behavior and millions of lines code. Despite their popularity, static known to generate an excess false positives. The recent ability Machine Learning models programming languages opens new possibilities when applied analysis. However, existing datasets train identification suffer from multiple limitations such limited bug context, size, synthetic unrealistic source We propose D2A, a...

10.48550/arxiv.2102.07995 preprint EN other-oa arXiv (Cornell University) 2021-01-01

The recent improvement in code generation capabilities due to the use of large language models has mainly benefited general purpose programming languages. Domain specific languages, such as ones used for IT Automation, have received far less attention, despite involving many active developers and being an essential component modern cloud platforms. This work focuses on Ansible-YAML, a widely markup Automation. We present Ansible Wisdom, natural-language Ansible-YAML tool, aimed at improving...

10.48550/arxiv.2305.02783 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Deep Learning (DL) models to analyze source code have shown immense promise during the past few years. More recently, self-supervised pre-training has gained traction for learning generic representations valuable many downstream SE tasks, such as clone and bug detection. While previous work successfully learned from different abstractions (e.g., token, AST, graph), we argue that it is also essential factor in how developers day-to-day general-purpose representation learning. On one hand,...

10.48550/arxiv.2306.03234 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Code Large Language Models (Code LLMs) are being increasingly employed in real-life applications, so evaluating them is critical. While the conventional accuracy evaluates performance of LLMs on a set individual tasks, their self-consistency across different tasks overlooked. Intuitively, trustworthy model should be self-consistent when generating natural language specifications for its own code and specifications. Failure to preserve reveals lack understanding shared semantics underlying...

10.48550/arxiv.2310.14053 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01

Large language models (LLMs) have become remarkably good at improving developer productivity for high-resource programming languages. These use two kinds of data: large amounts unlabeled code samples pre-training and relatively smaller labeled fine-tuning or in-context learning. Unfortunately, many languages are low-resource, lacking most tasks often even samples. Therefore, users low-resource (e.g., legacy new languages) miss out on the benefits LLMs. Cross-lingual transfer uses data from a...

10.48550/arxiv.2310.16937 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Understanding the functional (dis)-similarity of source code is significant for modeling tasks such as software vulnerability and clone detection. We present DISCO(DIS-similarity COde), a novel self-supervised model focusing on identifying (dis)similar functionalities code. Different from existing works, our approach does not require huge amount randomly collected datasets. Rather, we design structure-guided transformation algorithms to generate synthetic clones inject real-world security...

10.48550/arxiv.2110.03868 preprint EN other-oa arXiv (Cornell University) 2021-01-01
Coming Soon ...