How Many Topics? Stability Analysis for Topic Models
Thematic structure
Non-negative Matrix Factorization
DOI:
10.48550/arxiv.1404.4606
Publication Date:
2014-01-01
AUTHORS (3)
ABSTRACT
Topic modeling refers to the task of discovering underlying thematic structure in a text corpus, where output is commonly presented as report top terms appearing each topic. Despite diversity topic algorithms that have been proposed, common challenge successfully applying these techniques selection an appropriate number topics for given corpus. Choosing too few will produce results are overly broad, while choosing many result "over-clustering" corpus into small, highly-similar topics. In this paper, we propose term-centric stability analysis strategy address issue, idea being model with be more robust perturbations data. Using approach based on matrix factorization, evaluations performed range corpora show can guide process.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....