DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning

FOS: Computer and information sciences Computer Science - Machine Learning Machine Learning (cs.LG)
DOI: 10.48550/arxiv.2402.17453 Publication Date: 2024-02-27
ABSTRACT
In this work, we investigate the potential of large language models (LLMs) based agents to automate data science tasks, with goal comprehending task requirements, then building and training best-fit machine learning models. Despite their widespread success, existing LLM are hindered by generating unreasonable experiment plans within scenario. To end, present DS-Agent, a novel automatic framework that harnesses agent case-based reasoning (CBR). development stage, DS-Agent follows CBR structure an iteration pipeline, which can flexibly capitalize on expert knowledge from Kaggle, facilitate consistent performance improvement through feedback mechanism. Moreover, implements low-resource deployment stage simplified paradigm adapt past successful solutions for direct code generation, significantly reducing demand foundational capabilities LLMs. Empirically, GPT-4 achieves unprecedented 100% success rate in while attaining 36% average one pass across alternative LLMs stage. both stages, best rank performance, costing \$1.60 \$0.13 per run GPT-4, respectively.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....