CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks
Code review
KPI-driven code analysis
DOI:
10.48550/arxiv.2105.12655
Publication Date:
2021-01-01
AUTHORS (17)
ABSTRACT
Over the last several decades, software has been woven into fabric of every aspect our society. As development surges and code infrastructure enterprise applications ages, it is now more critical than ever to increase productivity modernize legacy applications. Advances in deep learning machine algorithms have enabled numerous breakthroughs, motivating researchers leverage AI techniques improve efficiency. Thus, fast-emerging research area for Code garnered new interest gathered momentum. In this paper, we present a large-scale dataset CodeNet, consisting over 14 million samples about 500 lines 55 different programming languages, which aimed at teaching code. addition its large scale, CodeNet rich set high-quality annotations benchmark help accelerate variety coding tasks, including similarity classification, translation between performance (runtime memory) improvement techniques. Additionally, provides sample input output test sets 98.5% samples, can be used as an oracle determining correctness potentially guide reinforcement quality improvements. usability feature, provide pre-processing tools transform source representations that readily inputs models. Results classification experiments using are provided reference. We hope diversity rich, will offer unprecedented opportunities intersection Software Engineering.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....