NFDI4DS | UHH-SEMS - Publication Details

Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain

Code (set theory) Trustworthiness

DOI: 10.48550/arxiv.2310.14053 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (7)

Marcus J. Min

Yangruibo Ding

Luca Buratti

Saurabh Pujar

Gail E. Kaiser

Suman Jana

Baishakhi Ray

ABSTRACT

Code Large Language Models (Code LLMs) are being increasingly employed in real-life applications, so evaluating them is critical. While the conventional accuracy evaluates performance of LLMs on a set individual tasks, their self-consistency across different tasks overlooked. Intuitively, trustworthy model should be self-consistent when generating natural language specifications for its own code and specifications. Failure to preserve reveals lack understanding shared semantics underlying programming language, therefore undermines trustworthiness model. In this paper, we first formally define then design framework, IdentityChain, which effectively efficiently at same time. We study eleven show that they fail self-consistency, indeed distinct aspect from accuracy. Furthermore, IdentityChain can used as debugging tool expose weaknesses by demonstrating three major identify current models using IdentityChain. Our available https://github.com/marcusm117/IdentityChain.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....