NFDI4DS | UHH-SEMS - Publication Details

Learning large softmax mixtures with warm start EM

Softmax function Start up

DOI: 10.48550/arxiv.2409.09903 Publication Date: 2024-09-15

Abstract Supplemental Material References Cited by

AUTHORS (4)

Xin Bing

Florentina Bunea

Jonathan Niles‐Weed

Marten Wegkamp

ABSTRACT

Mixed multinomial logits are discrete mixtures introduced several decades ago to model the probability of choosing an attribute from $p$ possible candidates, in heterogeneous populations. The has recently attracted attention AI literature, under name softmax mixtures, where it is routinely used final layer a neural network map large number vectors $\mathbb{R}^L$ vector. Despite its wide applicability and empirical success, statistically optimal estimators mixture parameters, obtained via algorithms whose running time scales polynomially $L$, not known. This paper provides solution this problem for contemporary applications, such as language models, which support points, size $N$ sample observed also large. Our proposed estimator combines two classical estimators, respectively method moments (MoM) expectation-minimization (EM) algorithm. Although both types have been studied, theoretical perspective, Gaussian no similar results exist either procedure. We develop new MoM parameter based on latent moment estimation that tailored our model, provide first analysis MoM-based procedure mixtures. consistent, can exhibit poor numerical performance, other models. Nevertheless, provably neighborhood target, be warm start any iterative study detail EM algorithm, proposal algorithm with start.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Learning large softmax mixtures with warm start EM

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....