Universal Representation for Code

Python Code (set theory) Universal code Discriminative model Control flow graph Representation
DOI: 10.48550/arxiv.2103.03116 Publication Date: 2021-01-01
ABSTRACT
Learning from source code usually requires a large amount of labeled data. Despite the possible scarcity data, trained model is highly task-specific and lacks transferability to different tasks. In this work, we present effective pre-training strategies on top novel graph-based representation, produce universal representations for code. Specifically, our representation captures important semantics between elements (e.g., control flow data flow). We pre-train graph neural networks extract properties. The pre-trained then enables possibility fine-tuning support various downstream applications. evaluate two real-world datasets -- spanning over 30M Java methods 770K Python methods. Through visualization, reveal discriminative properties in representation. By comparing multiple benchmarks, demonstrate that proposed framework achieves state-of-the-art results method name prediction link prediction.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....