NFDI4DS | UHH-SEMS - Publication Details

Is the Number of Trainable Parameters All That Actually Matters?

Spurious relationship Scaling law

DOI: 10.48550/arxiv.2109.11928 Publication Date: 2021-01-01

Abstract Supplemental Material References Cited by

AUTHORS (5)

Amélie Chatelain

Amine Djeghri

Daniel Hesslow

Julien Launay

Iacopo Poli

ABSTRACT

Recent work has identified simple empirical scaling laws for language models, linking compute budget, dataset size, model and autoregressive modeling loss. The validity of these power across orders magnitude in scale provides compelling evidence that larger models are also more capable models. However, up under the constraints hardware infrastructure is no easy feat, rapidly becomes a hard expensive engineering problem. We investigate ways to tentatively cheat laws, train cheaper. emulate an increase effective parameters, using efficient approximations: either by doping with frozen random or fast structured transforms place dense linear layers. find relationship between test loss depends only on actual number trainable parameters; cannot be deceived spurious parameters.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Is the Number of Trainable Parameters All That Actually Matters?

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....