Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?
Representation
Code (set theory)
Code review
DOI:
10.48550/arxiv.2312.00413
Publication Date:
2023-01-01
AUTHORS (11)
ABSTRACT
Programming language understanding and representation (a.k.a code learning) has always been a hot challenging task in software engineering. It aims to apply deep learning techniques produce numerical representations of the source features while preserving its semantics. These can be used for facilitating subsequent code-related tasks. The abstract syntax tree (AST), fundamental feature, illustrates syntactic information widely learning. However, there is still lack systematic quantitative evaluation how well AST-based facilitates In this paper, we first conduct comprehensive empirical study explore effectiveness follow-up To do so, compare performance models trained with token sequence (Token short) based on three popular types Surprisingly, overall statistical results demonstrate that consistently perform worse across all tasks compared Token-based representation. Our further analysis reveals outperform certain subsets samples We also experiments evaluate reveal impact choice AST parsing/preprocessing/encoding methods provides future researchers detailed guidance select solutions at each stage fully exploit AST.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....