Enhancing Token Filtering Efficiency in Large Language Model Training with Collider
FOS: Computer and information sciences
Computer Science - Machine Learning
Computer Science - Computation and Language
Computer Science - Distributed, Parallel, and Cluster Computing
Distributed, Parallel, and Cluster Computing (cs.DC)
Computation and Language (cs.CL)
Machine Learning (cs.LG)
DOI:
10.48550/arxiv.2502.00340
Publication Date:
2025-02-01
AUTHORS (7)
ABSTRACT
Token filtering has been proposed to enhance utility of large language models (LLMs) by eliminating inconsequential tokens during training. While using fewer should reduce computational workloads, existing studies have not succeeded in achieving higher efficiency. This is primarily due the insufficient sparsity caused only output layers, as well inefficient sparse GEMM (General Matrix Multiplication), even when having sufficient sparsity. paper presents Collider, a system unleashing full efficiency token LLM At its core, Collider filters activations across all layers maintain Additionally, it features an automatic workflow that transforms into dimension-reduced dense for optimized Evaluations on three LLMs-TinyLlama-1.1B, Qwen2.5-1.5B, and Phi1.5-1.4B-demonstrate reduces backpropagation time up 35.1% end-to-end training 22.0% 40% tokens. Utility assessments TinyLlama 15B indicate sustains advancements relatively improving model 16.3% comparing regular training, from 4.7 days 3.5 8 GPUs. designed easy integration frameworks, allowing systems already accelerate with just one line code.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....