LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization

Lasso Regularization Feature (linguistics)
DOI: 10.48550/arxiv.2502.10648 Publication Date: 2025-02-14
ABSTRACT
We introduce LLM-Lasso, a novel framework that leverages large language models (LLMs) to guide feature selection in Lasso $\ell_1$ regression. Unlike traditional methods rely solely on numerical data, LLM-Lasso incorporates domain-specific knowledge extracted from natural language, enhanced through retrieval-augmented generation (RAG) pipeline, seamlessly integrate data-driven modeling with contextual insights. Specifically, the LLM generates penalty factors for each feature, which are converted into weights using simple, tunable model. Features identified as more relevant by receive lower penalties, increasing their likelihood of being retained final model, while less features assigned higher reducing influence. Importantly, has an internal validation step determines how much trust our prediction pipeline. Hence it addresses key challenges robustness, making suitable mitigating potential inaccuracies or hallucinations LLM. In various biomedical case studies, outperforms standard and existing baselines, all ensuring operates without prior access datasets. To knowledge, this is first approach effectively conventional techniques directly LLM-based reasoning.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....