NFDI4DS | UHH-SEMS - Publication Details

PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models

Benchmarking Code (set theory)

DOI: 10.48550/arxiv.2401.15545 Publication Date: 2024-01-27

Abstract Supplemental Material References Cited by

AUTHORS (5)

Simin Chen

Xiaoning Feng

Han Xiao-Hong

Cong Liu

Wei Yang

ABSTRACT

In recent times, a plethora of Large Code Generation Models (LCGMs) have been proposed, showcasing significant potential in assisting developers with complex programming tasks. Benchmarking LCGMs necessitates the creation set diverse problems, and each problem comprises prompt (including task description), canonical solution, test inputs. The existing methods for constructing such can be categorized into two main types: manual perturbation-based methods. However, demand high effort lack scalability, while also risking data integrity due to LCGMs' potentially contaminated collection, approaches mainly generate semantically homogeneous problems same solutions introduce typos that easily auto-corrected by IDE, making them ineffective unrealistic. this work, we propose idea merging (PPM) provide implementation idea, utilize our tool on widely-used datasets compare it against nine baseline using eight code generation models. results demonstrate effectiveness generating more challenging, diverse, natural comparing baselines.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications

PlumX Metrics

PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....