On the Learning Dynamics of Deep Neural Networks
Initialization
Deep Neural Networks
DOI:
10.48550/arxiv.1809.06848
Publication Date:
2018-01-01
AUTHORS (5)
ABSTRACT
While a lot of progress has been made in recent years, the dynamics learning deep nonlinear neural networks remain to this day largely misunderstood. In work, we study case binary classification and prove various properties such under strong assumptions as linear separability data. Extending existing results from case, confirm empirical observations by proving that error also follows sigmoidal shape architectures. We show given proper initialization, expounds parallel independent modes certain regions parameter space might lead failed training. demonstrate input norm features' frequency dataset distinct convergence speeds which shed some light on generalization capabilities networks. provide comparison between with cross-entropy hinge losses, could useful understand training generative adversarial Finally, identify phenomenon baptize gradient starvation where most frequent features prevent other less but equally informative features.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....