NFDI4DS | UHH-SEMS - Publication Details

RecSSD: near data processing for solid state drive based recommendation inference

OPENALEX - Publications

Mark Wilkening Udit Gupta Samuel Hsia Caroline Trippel Carole-Jean Wu and 2 more

Neural personalized recommendation models are used across a wide variety of datacenter applications including search, social media, and entertainment. State-of-the-art comprise large embedding tables that have billions parameters requiring memory capacities. Unfortunately, fast DRAM-based memories levy high infrastructure costs. Conventional SSD-based storage solutions offer an order magnitude larger capacity, but worse read latency bandwidth, degrading inference performance. RecSSD is near...

10.1145/3445814.3446763 article EN 2021-04-11

SmartNIC Security Isolation in the Cloud with S-NIC

OPENALEX - Publications

Yang Zhou Mark Wilkening James Mickens Minlan Yu

Modern smart NICs provide little isolation between the network functions belonging to different tenants. These also do not protect from datacenter-provided management OS which runs on NIC. We describe concrete attacks allow a function's state leak (or be modified by) another function or OS. then introduce S-NIC, new hardware design for that provides strong guarantees. S-NIC pervasively virtualizes accelerators, enforces single-owner semantics each line in on-NIC cache and RAM, dedicated bus...

10.1145/3627703.3650071 article EN 2024-04-18

Calculating Architectural Vulnerability Factors for Spatial Multi-Bit Transient Faults

OPENALEX - Publications

Mark Wilkening Vilas Sridharan Si Li Fritz Previlon Sudhanva Gurumurthi and 1 more

Reliability is an important design constraint in modern microprocessors, and one of the fundamental reliability challenges combating effects transient faults. This requires extensive analysis, including significant fault modelling allow architects to make informed tradeoffs. Recent data shows that multi-bit faults are becoming more common, increasing from 0.5% static random-access memory (SRAM) 180nm 3.9% 22nm. Such predicted be even prevalent smaller technology nodes. Therefore, accurately...

10.1109/micro.2014.15 article EN 2014-12-01

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

OPENALEX - Publications

Udit Gupta Samuel Hsia Jeff Zhang Mark Wilkening Javin Pombra and 4 more

Deep learning recommendation systems must provide high quality, personalized content under strict tail-latency targets and system loads. This paper presents RecPipe, a to jointly optimize quality inference performance. Central RecPipe is decomposing models into multi-stage pipelines maintain while reducing compute complexity exposing distinct parallelism opportunities. implements an scheduler map engines onto commodity, heterogeneous platforms (e.g., CPUs, GPUs). While the hardware-aware...

10.1145/3466752.3480127 article EN 2021-10-17

Cross-Stack Workload Characterization of Deep Recommendation Systems

OPENALEX - Publications

Samuel Hsia Udit Gupta Mark Wilkening Carole-Jean Wu Gu-Yeon Wei and 1 more

Deep learning based recommendation systems form the backbone of most personalized cloud services. Though computer architecture community has recently started to take notice deep inference, resulting solutions have taken wildly different approaches - ranging from near memory processing at-scale optimizations. To better design future hardware for we must first systematically examine and characterize underlying systems-level impact decisions across levels execution stack. In this paper, eight...

10.1109/iiswc50251.2020.00024 article EN 2020-10-01

A Research Retrospective on AMD's Exascale Computing Journey

OPENALEX - Publications

Gabriel H. Loh Michael Schulte Mike Ignatowski Vignesh Adhinarayanan Shaizeen Aga and 66 more

The pace of advancement the top-end supercomputers historically followed an exponential curve similar to (and driven in part by) Moore's Law. Shortly after hitting petaflop mark, community started looking ahead next milestone: Exascale. However, many obstacles were already looming on horizon, such as slowing Law, and others like end Dennard Scaling had arrived. Anticipating significant challenges for overall high-performance computing (HPC) achieve 1000x improvement, U.S. Department Energy...

10.1145/3579371.3589349 article EN 2023-06-16

Evaluating the Resilience of Parallel Applications

OPENALEX - Publications

Mark Wilkening Fritz Previlon David Kaeli Sudhanva Gurumurthi Steven Raasch and 1 more

Reliability is a significant design constraint for supercomputers and large-scale data centers. Modeling the effects of faults on applications targeted to such systems allows system architects software designers provision resilience features, that improve fidelity results reduce runtimes. In this paper, we propose mechanisms existing techniques model effect transient realistic applications. First, extend Program Vulnerability Factor metric multi-threaded Then demonstrate how measure PVF an...

10.1109/dft.2018.8602987 article EN 2018-10-01

RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference

OPENALEX - Publications

Mark Wilkening Udit Gupta Samuel Hsia Caroline Trippel Carole-Jean Wu and 2 more

Neural personalized recommendation models are used across a wide variety of datacenter applications including search, social media, and entertainment. State-of-the-art comprise large embedding tables that have billions parameters requiring memory capacities. Unfortunately, fast DRAM-based memories levy high infrastructure costs. Conventional SSD-based storage solutions offer an order magnitude larger capacity, but worse read latency bandwidth, degrading inference performance. RecSSD is near...

10.48550/arxiv.2102.00075 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Cross-Stack Workload Characterization of Deep Recommendation Systems

OPENALEX - Publications

Samuel Hsia Udit Gupta Mark Wilkening Carole-Jean Wu Gu-Yeon Wei and 1 more

Deep learning based recommendation systems form the backbone of most personalized cloud services. Though computer architecture community has recently started to take notice deep inference, resulting solutions have taken wildly different approaches - ranging from near memory processing at-scale optimizations. To better design future hardware for we must first systematically examine and characterize underlying systems-level impact decisions across levels execution stack. In this paper, eight...

10.48550/arxiv.2010.05037 preprint EN other-oa arXiv (Cornell University) 2020-01-01

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

OPENALEX - Publications

Udit Gupta Samuel Hsia Jeff Zhang Mark Wilkening Javin Pombra and 4 more

Deep learning recommendation systems must provide high quality, personalized content under strict tail-latency targets and system loads. This paper presents RecPipe, a to jointly optimize quality inference performance. Central RecPipe is decomposing models into multi-stage pipelines maintain while reducing compute complexity exposing distinct parallelism opportunities. implements an scheduler map engines onto commodity, heterogeneous platforms (e.g., CPUs, GPUs).While the hardware-aware...

10.48550/arxiv.2105.08820 preprint EN other-oa arXiv (Cornell University) 2021-01-01