Do Code Summarization Models Process Too Much Information? Function Signature May Be All That Is Needed

Code (set theory) Program comprehension Code review
DOI: 10.1145/3652156 Publication Date: 2024-03-14T12:24:16Z
ABSTRACT
With the fast development of large software projects, automatic code summarization techniques, which summarize main functionalities a piece using natural languages as comments, play essential roles in helping developers understand and maintain projects. Many research efforts have been devoted to building approaches. Typical approaches are based on deep learning models. They transform task into sequence-to-sequence task, inputs source outputs summarizations languages. All models impose different input size limits, such 50 10,000, for code. However, how limit affects performance still remains under-explored. In this article, we first conduct an empirical study investigate impacts limits quality generated comments. To our surprise, experiments multiple datasets reveal that setting low limit, 20, does not necessarily reduce Based finding, further propose use function signatures instead full then Experiments statistical results show with are, average, more than 2 percentage points better without thus demonstrate effectiveness involving summarization. We also invite programmers do questionnaire evaluate summaries by two truncation levels. The generate, 9.2% high-quality comments
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (80)
CITATIONS (5)