Learning large softmax mixtures with warm start EM
Softmax function
Start up
DOI:
10.48550/arxiv.2409.09903
Publication Date:
2024-09-15
AUTHORS (4)
ABSTRACT
Mixed multinomial logits are discrete mixtures introduced several decades ago to model the probability of choosing an attribute from $p$ possible candidates, in heterogeneous populations. The has recently attracted attention AI literature, under name softmax mixtures, where it is routinely used final layer a neural network map large number vectors $\mathbb{R}^L$ vector. Despite its wide applicability and empirical success, statistically optimal estimators mixture parameters, obtained via algorithms whose running time scales polynomially $L$, not known. This paper provides solution this problem for contemporary applications, such as language models, which support points, size $N$ sample observed also large. Our proposed estimator combines two classical estimators, respectively method moments (MoM) expectation-minimization (EM) algorithm. Although both types have been studied, theoretical perspective, Gaussian no similar results exist either procedure. We develop new MoM parameter based on latent moment estimation that tailored our model, provide first analysis MoM-based procedure mixtures. consistent, can exhibit poor numerical performance, other models. Nevertheless, provably neighborhood target, be warm start any iterative study detail EM algorithm, proposal algorithm with start.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....