NFDI4DS | UHH-SEMS - Publication Details

Do Code Summarization Models Process Too Much Information? Function Signature May Be All That Is Needed

Code (set theory) Program comprehension Code review

DOI: 10.1145/3652156 Publication Date: 2024-03-14T12:24:16Z

Abstract Supplemental Material References Cited by

AUTHORS (6)

Xi Ding

Rui Peng

Xiangping Chen

Yuan Huang

Jing Bian

Zibin Zheng

ABSTRACT

With the fast development of large software projects, automatic code summarization techniques, which summarize main functionalities a piece using natural languages as comments, play essential roles in helping developers understand and maintain projects. Many research efforts have been devoted to building approaches. Typical approaches are based on deep learning models. They transform task into sequence-to-sequence task, inputs source outputs summarizations languages. All models impose different input size limits, such 50 10,000, for code. However, how limit affects performance still remains under-explored. In this article, we first conduct an empirical study investigate impacts limits quality generated comments. To our surprise, experiments multiple datasets reveal that setting low limit, 20, does not necessarily reduce Based finding, further propose use function signatures instead full then Experiments statistical results show with are, average, more than 2 percentage points better without thus demonstrate effectiveness involving summarization. We also invite programmers do questionnaire evaluate summaries by two truncation levels. The generate, 9.2% high-quality comments

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (80)

CITATIONS (5)

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications CROSSREF - Publications

PlumX Metrics

Do Code Summarization Models Process Too Much Information? Function Signature May Be All That Is Needed

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....