NFDI4DS | UHH-SEMS - Publication Details

Beyond One-Size-Fits-All: Tailored Benchmarks for Efficient Evaluation

FOS: Computer and information sciences Computer Science - Machine Learning Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2502.13576 Publication Date: 2025-02-19

Abstract Supplemental Material References Cited by

AUTHORS (10)

Peiwen Yuan

Yueqi Zhang

Shaoxiong Feng

Y.X. Li

Xinglin Wang

Jiayi Shi

Chuyi Tan

Boyuan Pan

Xiongqi Pang

Kan Li

ABSTRACT

Evaluating models on large benchmarks is very resource-intensive, especially during the period of rapid model evolution. Existing efficient evaluation methods estimate performance target by testing them only a small and static coreset benchmark, which derived from publicly available results source models. These rely assumption that have high prediction consistency with However, we demonstrate it doesn't generalize well in practice. To alleviate inconsistency issue, present TailoredBench, method conducts customized tailored to each model. Specifically, Global-coreset first constructed as probe identify most consistent for an adaptive selection strategy. Afterwards, scalable K-Medoids clustering algorithm proposed extend Native-coreset According predictions Native-coresets, obtain whole benchmark calibrated estimation Comprehensive experiments 5 across over 300 compared best performing baselines, TailoredBench achieves average reduction 31.4% MAE accuracy estimates under same inference budgets, showcasing strong effectiveness generalizability.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Beyond One-Size-Fits-All: Tailored Benchmarks for Efficient Evaluation

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....