PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models

Benchmarking Code (set theory)
DOI: 10.48550/arxiv.2401.15545 Publication Date: 2024-01-27
ABSTRACT
In recent times, a plethora of Large Code Generation Models (LCGMs) have been proposed, showcasing significant potential in assisting developers with complex programming tasks. Benchmarking LCGMs necessitates the creation set diverse problems, and each problem comprises prompt (including task description), canonical solution, test inputs. The existing methods for constructing such can be categorized into two main types: manual perturbation-based methods. However, demand high effort lack scalability, while also risking data integrity due to LCGMs' potentially contaminated collection, approaches mainly generate semantically homogeneous problems same solutions introduce typos that easily auto-corrected by IDE, making them ineffective unrealistic. this work, we propose idea merging (PPM) provide implementation idea, utilize our tool on widely-used datasets compare it against nine baseline using eight code generation models. results demonstrate effectiveness generating more challenging, diverse, natural comparing baselines.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....