TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining
Benchmark (surveying)
DOI:
10.48550/arxiv.2504.02107
Publication Date:
2025-04-02
AUTHORS (11)
ABSTRACT
Large Language Models (LLMs) trained on historical web data inevitably become outdated. We investigate evaluation strategies and update methods for LLMs as new becomes available. introduce a web-scale dataset time-continual pretraining of derived from 114 dumps Common Crawl (CC) - orders magnitude larger than previous continual language modeling benchmarks. also design time-stratified evaluations across both general CC specific domains (Wikipedia, StackExchange, code documentation) to assess how well various learning adapt while retaining past knowledge. Our findings demonstrate that, data, autoregressive meta-schedules combined with fixed-ratio replay older can achieve comparable held-out loss re-training scratch, requiring significantly less computation (2.6x). However, the optimal balance between incorporating replaying old differs is crucial avoid forgetting generic but so domains.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....