Simplifying DINO via Coding Rate Regularization

Regularization
DOI: 10.48550/arxiv.2502.10385 Publication Date: 2025-02-14
ABSTRACT
DINO and DINOv2 are two model families being widely used to learn representations from unlabeled imagery data at large scales. Their learned often enable state-of-the-art performance for downstream tasks, such as image classification segmentation. However, they employ many empirically motivated design choices their training pipelines highly complex unstable -- hyperparameters need be carefully tuned ensure that the do not collapse which poses considerable difficulty improving them or adapting new domains. In this work, we posit can remove most such-motivated idiosyncrasies in pre-training pipelines, only add an explicit coding rate term loss function avoid of representations. As a result, obtain simplified variants call SimDINO SimDINOv2, respectively. Remarkably, these models more robust different choices, network architecture hyperparameters, even higher-quality representations, measured by on offering Pareto improvement over corresponding models. This work highlights potential using simplifying principles improve empirical practice deep learning.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....