NFDI4DS | UHH-SEMS - Publication Details

Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?

Representation Code (set theory) Code review

DOI: 10.48550/arxiv.2312.00413 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (11)

Weisong Sun

Chunrong Fang

Yun Miao

Yudu You

Mengzhe Yuan

Yuchen Chen

Quanjun Zhang

An G

Xiang Chen

Yang Liu

Zhenyu Chen

ABSTRACT

Programming language understanding and representation (a.k.a code learning) has always been a hot challenging task in software engineering. It aims to apply deep learning techniques produce numerical representations of the source features while preserving its semantics. These can be used for facilitating subsequent code-related tasks. The abstract syntax tree (AST), fundamental feature, illustrates syntactic information widely learning. However, there is still lack systematic quantitative evaluation how well AST-based facilitates In this paper, we first conduct comprehensive empirical study explore effectiveness follow-up To do so, compare performance models trained with token sequence (Token short) based on three popular types Surprisingly, overall statistical results demonstrate that consistently perform worse across all tasks compared Token-based representation. Our further analysis reveals outperform certain subsets samples We also experiments evaluate reveal impact choice AST parsing/preprocessing/encoding methods provides future researchers detailed guidance select solutions at each stage fully exploit AST.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....