AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models
FOS: Computer and information sciences
Computer Science - Machine Learning
Computer Science - Computation and Language
Artificial Intelligence (cs.AI)
Computer Science - Artificial Intelligence
Computation and Language (cs.CL)
Machine Learning (cs.LG)
DOI:
10.48550/arxiv.2406.16714
Publication Date:
2024-06-24
AUTHORS (9)
ABSTRACT
Although Large Language Models (LLMs) are becoming increasingly powerful, they still exhibit significant but subtle weaknesses, such as mistakes in instruction-following or coding tasks. As these unexpected errors could lead to severe consequences practical deployments, it is crucial investigate the limitations within LLMs systematically. Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies, while manual inspections costly and not scalable. In this paper, we introduce a unified framework, AutoDetect, automatically expose weaknesses across various Inspired by educational assessment process that measures students' learning outcomes, AutoDetect consists of three LLM-powered agents: Examiner, Questioner, Assessor. The collaboration among agents designed realize comprehensive in-depth weakness identification. Our framework demonstrates success uncovering flaws, with an identification rate exceeding 30% prominent models ChatGPT Claude. More importantly, identified can guide improvements, proving more effective than untargeted data augmentation methods like Self-Instruct. approach has led substantial enhancements popular LLMs, including Llama series Mistral-7b, boosting their performance over 10% several benchmarks. Code publicly available at https://github.com/thu-coai/AutoDetect.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....