AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Computation and Language Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Computation and Language (cs.CL) Machine Learning (cs.LG)
DOI: 10.48550/arxiv.2406.16714 Publication Date: 2024-06-24
ABSTRACT
Although Large Language Models (LLMs) are becoming increasingly powerful, they still exhibit significant but subtle weaknesses, such as mistakes in instruction-following or coding tasks. As these unexpected errors could lead to severe consequences practical deployments, it is crucial investigate the limitations within LLMs systematically. Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies, while manual inspections costly and not scalable. In this paper, we introduce a unified framework, AutoDetect, automatically expose weaknesses across various Inspired by educational assessment process that measures students' learning outcomes, AutoDetect consists of three LLM-powered agents: Examiner, Questioner, Assessor. The collaboration among agents designed realize comprehensive in-depth weakness identification. Our framework demonstrates success uncovering flaws, with an identification rate exceeding 30% prominent models ChatGPT Claude. More importantly, identified can guide improvements, proving more effective than untargeted data augmentation methods like Self-Instruct. approach has led substantial enhancements popular LLMs, including Llama series Mistral-7b, boosting their performance over 10% several benchmarks. Code publicly available at https://github.com/thu-coai/AutoDetect.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....