NFDI4DS | UHH-SEMS - Publication Details

LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization

Lasso Regularization Feature (linguistics)

DOI: 10.48550/arxiv.2502.10648 Publication Date: 2025-02-14

Abstract Supplemental Material References Cited by

AUTHORS (10)

Erica Zhang

Rei Goto

Naomi Sagan

Jurik Mutter

Nick Phillips

Ash A. Alizadeh

Kangwook Lee

José Blanchet

Mert Pilancı

Robert Tibshirani

ABSTRACT

We introduce LLM-Lasso, a novel framework that leverages large language models (LLMs) to guide feature selection in Lasso $\ell_1$ regression. Unlike traditional methods rely solely on numerical data, LLM-Lasso incorporates domain-specific knowledge extracted from natural language, enhanced through retrieval-augmented generation (RAG) pipeline, seamlessly integrate data-driven modeling with contextual insights. Specifically, the LLM generates penalty factors for each feature, which are converted into weights using simple, tunable model. Features identified as more relevant by receive lower penalties, increasing their likelihood of being retained final model, while less features assigned higher reducing influence. Importantly, has an internal validation step determines how much trust our prediction pipeline. Hence it addresses key challenges robustness, making suitable mitigating potential inaccuracies or hallucinations LLM. In various biomedical case studies, outperforms standard and existing baselines, all ensuring operates without prior access datasets. To knowledge, this is first approach effectively conventional techniques directly LLM-based reasoning.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....