PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training
Scale model
DOI:
10.48550/arxiv.2406.03868
Publication Date:
2024-06-06
AUTHORS (8)
ABSTRACT
Deep learning (DL) models are piquing high interest and scaling at an unprecedented rate. To this end, a handful of tiled accelerators have been proposed to support such large-scale training tasks. However, these often incorporate numerous cores or tiles even extending wafer-scale, substantial on-chip bandwidth, distributed memory systems. This results in exceedingly complex design space. Moreover, conducting actual experiments find optimal configurations is impractical due time constraints. Hence, predicting the mapping various parallelisms system architectures becomes crucial. In study, leveraging analysis existing mainstream DL model strategies, we introduce performance simulator named PALM. PALM targets both inference processes for accelerators, aiming inspire current future accelerators. Specifically, (i) establish scheduling mechanism among based on event-driven framework; (ii) user-configurable pipeline, tensor, data parallelism determining absolute throughput under strategies; (iii) interaction SRAM, NoC, off-chip DRAM during operator execution. work available here: https://github.com/fangjh21/PALM.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....