NFDI4DS | UHH-SEMS - Publication Details

AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Computation and Language Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computation and Language (cs.CL) Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2406.16714 Publication Date: 2024-06-24

Abstract Supplemental Material References Cited by

AUTHORS (9)

Jiale Cheng

Yida Lu

Xiaotao Gu

Pei Ke

Xiao Liu

Yuxiao Dong

Hongning Wang

Jie Tang

Minlie Huang

ABSTRACT

Although Large Language Models (LLMs) are becoming increasingly powerful, they still exhibit significant but subtle weaknesses, such as mistakes in instruction-following or coding tasks. As these unexpected errors could lead to severe consequences practical deployments, it is crucial investigate the limitations within LLMs systematically. Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies, while manual inspections costly and not scalable. In this paper, we introduce a unified framework, AutoDetect, automatically expose weaknesses across various Inspired by educational assessment process that measures students' learning outcomes, AutoDetect consists of three LLM-powered agents: Examiner, Questioner, Assessor. The collaboration among agents designed realize comprehensive in-depth weakness identification. Our framework demonstrates success uncovering flaws, with an identification rate exceeding 30% prominent models ChatGPT Claude. More importantly, identified can guide improvements, proving more effective than untargeted data augmentation methods like Self-Instruct. approach has led substantial enhancements popular LLMs, including Llama series Mistral-7b, boosting their performance over 10% several benchmarks. Code publicly available at https://github.com/thu-coai/AutoDetect.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....