Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine
Translation lookaside buffer
Page
DOI:
10.48550/arxiv.1808.09751
Publication Date:
2018-01-01
AUTHORS (4)
ABSTRACT
Shared virtual memory (SVM) is key in heterogeneous systems on chip (SoCs), which combine a general-purpose host processor with many-core accelerator, both for programmability and to avoid data duplication. However, SVM can bring significant run time overhead when translation lookaside buffer (TLB) entries are missing. Moreover, allowing DMA burst transfers write traditionally requires buffers absorb that miss the TLB. These have be overprovisioned maximum size, wasting precious on-chip memory, stall all accesses once they full, hampering scalability of parallel accelerators. In this work, we present our solution avoids majority TLB misses prefetching, supports without additional buffers, scaled workload number processors. Our based three novel concepts: To minimize rate misses, proactively filled by compiler-generated Prefetching Helper Threads, use run-time information issue timely prefetches. reduce latency handled variable Miss Handling Threads. support add lightweight hardware standard engine detect react misses. Compared state art, work improves accelerator performance memory-intensive kernels up 4x 60% irregular regular access patterns, respectively.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....