NFDI4DS | UHH-SEMS - Publication Details

System level analysis of fast, per-core DVFS using on-chip switching regulators

OPENALEX - Publications

Wonyoung Kim Meeta S. Gupta Gu-Yeon Wei David Brooks

Portable, embedded systems place ever-increasing demands on high-performance, low-power microprocessor design. Dynamic voltage and frequency scaling (DVFS) is a well-known technique to reduce energy in digital systems, but the effectiveness of DVFS hampered by slow transitions that occur order tens microseconds. In addition, recent trend towards chip-multiprocessors (CMP) executing multi-threaded workloads with heterogeneous behavior motivates need for per-core control mechanisms. Voltage...

10.1109/hpca.2008.4658633 article EN Proceedings - International Symposium on High-Performance Computer Architecture/Proceedings 2008-02-01

Process Variation Tolerant 3T1D-Based Cache Architectures

OPENALEX - Publications

Xiaoyao Liang R. Canal Gu-Yeon Wei David J. Brooks

Process variations will greatly impact the stability, leakage power consumption, and performance of future microprocessors. These are especially detrimental to 6T SRAM (6-transistor static memory) structures become critical with continued technology scaling. In this paper, we propose new on-chip memory architectures based on novel 3T1D DRAM (3-transistor, 1-diode dynamic cells. We provide a detailed comparison between designs in context L1 data cache. The effects physical device variation...

10.1109/micro.2007.40 article EN 2007-01-01

Minerva

OPENALEX - Publications

Brandon Reagen Paul N. Whatmough Robert Adolf Saketh Rama Hyunkwang Lee and 4 more

The continued success of Deep Neural Networks (DNNs) in classification tasks has sparked a trend accelerating their execution with specialized hardware. While published designs easily give an order magnitude improvement over general-purpose hardware, few look beyond initial implementation. This paper presents Minerva, highly automated co-design approach across the algorithm, architecture, and circuit levels to optimize DNN hardware accelerators. Compared established fixed-point accelerator...

10.1145/3007787.3001165 article EN ACM SIGARCH Computer Architecture News 2016-06-18

Profiling a warehouse-scale computer

OPENALEX - Publications

Svilen Kanev Juan Pablo Darago Kim Hazelwood Parthasarathy Ranganathan Tipp Moseley and 2 more

With the increasing prevalence of warehouse-scale (WSC) and cloud computing, understanding interactions server applications with underlying microarchitecture becomes ever more important in order to extract maximum performance out hardware. To aid such understanding, this paper presents a detailed microarchitectural analysis live datacenter jobs, measured on than 20,000 Google machines over three year period, comprising thousands different applications.

10.1145/2749469.2750392 article EN 2015-05-26

A Fully-Integrated 3-Level DC-DC Converter for Nanosecond-Scale DVFS

OPENALEX - Publications

Wonyoung Kim David Brooks Gu-Yeon Wei

On-chip DC-DC converters have the potential to offer fine-grain power management in modern chip-multiprocessors. This paper presents a fully integrated 3-level converter, hybrid of buck and switched-capacitor converters, implemented 130 nm CMOS technology. The converter enables smaller inductors (1 nH) than buck, while generating wide range output voltages compared 1/2 mode converter. test-chip prototype delivers up 0.85 A load current from 0.4 1.4 V 2.4 input supply. It achieves 77% peak...

10.1109/jssc.2011.2169309 article EN IEEE Journal of Solid-State Circuits 2011-11-18

Driving high voltage piezoelectric actuators in microrobotic applications

OPENALEX - Publications

Michael Karpelson Gu-Yeon Wei Robert J. Wood

10.1016/j.sna.2011.11.035 article EN Sensors and Actuators A Physical 2012-01-09

MachSuite: Benchmarks for accelerator design and customized architectures

OPENALEX - Publications

Brandon Reagen Robert Adolf Yakun Sophia Shao Gu-Yeon Wei D. Brooks

Recent high-level synthesis and accelerator-related architecture papers show a great disparity in workload selection. To improve standardization within the accelerator research community, we present MachSuite, collection of 19 benchmarks for evaluating tools accelerator-centric architectures. MachSuite spans broad application space, captures variety different program behaviors, provides implementations tailored towards needs designers researchers, including support synthesis. We illustrate...

10.1109/iiswc.2014.6983050 article EN 2014-10-01

Aladdin

OPENALEX - Publications

Yakun Sophia Shao Brandon Reagen Gu-Yeon Wei David Brooks

Hardware specialization, in the form of accelerators that provide custom datapath and control for specific algorithms applications, promises impressive performance energy advantages compared to traditional architectures. Current research accelerator analysis relies on RTL-based synthesis flows produce accurate timing, power, area estimates. Such techniques not only require significant effort expertise but are also slow tedious use, making large design space exploration infeasible. To...

10.1145/2678373.2665689 article EN ACM SIGARCH Computer Architecture News 2014-06-14

Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators

OPENALEX - Publications

Brandon Reagen Paul N. Whatmough Robert Adolf Saketh Rama Hyunkwang Lee and 4 more

The continued success of Deep Neural Networks (DNNs) in classification tasks has sparked a trend accelerating their execution with specialized hardware. While published designs easily give an order magnitude improvement over general-purpose hardware, few look beyond initial implementation. This paper presents Minerva, highly automated co-design approach across the algorithm, architecture, and circuit levels to optimize DNN hardware accelerators. Compared established fixed-point accelerator...

10.1109/isca.2016.32 article EN 2016-06-01

Ares

OPENALEX - Publications

Brandon Reagen Udit Gupta Lillian Pentecost Paul N. Whatmough Sae Kyu Lee and 3 more

As the use of deep neural networks continues to grow, so does fraction compute cycles devoted their execution. This has led CAD and architecture communities devote considerable attention building DNN hardware. Despite these efforts, fault tolerance DNNs generally been overlooked. paper is first conduct a large-scale, empirical study resilience. Motivated by inherent algorithmic resilience DNNs, we are interested in understanding relationship between rate model accuracy. To do so, present...

10.1145/3195970.3195997 article EN 2018-06-19

Benchmarking TPU, GPU, and CPU Platforms for Deep Learning

OPENALEX - Publications

Yu Emma Wang Gu-Yeon Wei David Brooks

Training deep learning models is compute-intensive and there an industry-wide trend towards hardware specialization to improve performance. To systematically benchmark platforms, we introduce ParaDnn, a parameterized suite for that generates end-to-end fully connected (FC), convolutional (CNN), recurrent (RNN) neural networks. Along with six real-world models, Google's Cloud TPU v2/v3, NVIDIA's V100 GPU, Intel Skylake CPU platform. We take dive into architecture, reveal its bottlenecks,...

10.48550/arxiv.1907.10701 preprint EN cc-by-sa arXiv (Cornell University) 2019-01-01

MLPerf Training Benchmark

OPENALEX - Publications

Peter Mattson Christine Cheng Cody Coleman Greg Diamos Paulius Micikevicius and 32 more

Machine learning (ML) needs industry-standard performance benchmarks to support design and competitive evaluation of the many emerging software hardware solutions for ML. But ML training presents three unique benchmarking challenges absent from other domains: optimizations that improve throughput can increase time solution, is stochastic solution exhibits high variance, systems are so diverse fair with same binary, code, even hyperparameters difficult. We therefore present MLPerf, an...

10.48550/arxiv.1910.01500 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Thread motion

OPENALEX - Publications

Krishna K. Rangan Gu-Yeon Wei David Brooks

Dynamic voltage and frequency scaling (DVFS) is a commonly-used power-management scheme that dynamically adjusts power performance to the time-varying needs of running programs. Unfortunately, conventional DVFS, relying on off-chip regulators, faces limitations in terms temporal granularity high costs when considered for future multi-core systems. To overcome these challenges, this paper presents thread motion (TM), fine-grained chip multiprocessors (CMPs). Instead incurring cost changing...

10.1145/1555754.1555793 article EN 2009-06-20

Understanding voltage variations in chip multiprocessors using a distributed power-delivery network

OPENALEX - Publications

Meeta S. Gupta Jarod L. Oatley Russ Joseph Gu-Yeon Wei David Brooks

Recent efforts to address microprocessor power dissipation through aggressive supply voltage scaling and management require that designers be increasingly cognizant of variations. These variations, primarily due fast changes in current, can attributed architectural gating events reduce dissipation. In order study this problem, the authors propose a fine-grain, parameterizable model for power-delivery networks allows system localized, on-chip fluctuations high-performance microprocessors....

10.5555/1266366.1266498 article EN 2007-04-16

14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IoT applications

OPENALEX - Publications

Paul N. Whatmough Sae Kyu Lee Hyunkwang Lee Saketh Rama David Brooks and 1 more

This paper presents a 28nm SoC with programmable FC-DNN accelerator design that demonstrates: (1) HW support to exploit data sparsity by eliding unnecessary computations (4× energy reduction); (2) improved algorithmic error tolerance using sign-magnitude number format for weights and datapath computation; (3) circuit-level timing violation in logic via timeborrowing; (4) combined circuit resilience Razor detection reduce VDD scaling or increase throughput FCLK scaling; (5) high...

10.1109/isscc.2017.7870351 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2017-02-01

Understanding Voltage Variations in Chip Multiprocessors using a Distributed Power-Delivery Network

OPENALEX - Publications

Meeta S. Gupta Jarod L. Oatley Russ Joseph Gu-Yeon Wei David Brooks

Recent efforts to address microprocessor power dissipation through aggressive supply voltage scaling and management require that designers be increasingly cognizant of variations. These variations, primarily due fast changes in current, can attributed architectural gating events reduce dissipation. In order study this problem, the authors propose a fine-grain, parameterizable model for power-delivery networks allows system localized, on-chip fluctuations high-performance microprocessors....

10.1109/date.2007.364663 article EN 2007-04-01

MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance

OPENALEX - Publications

Peter Mattson Vijay Janapa Reddi Christine Cheng Cody Coleman Greg Diamos and 7 more

In this article, we describe the design choices behind MLPerf, a machine learning performance benchmark that has become an industry standard. The first two rounds of MLPerf Training helped drive improvements to software-stack and scalability, showing 1.3× speedup in top 16-chip results despite higher quality targets 5.5× increase system scale. round Inference received over 500 from 14 different organizations, growing adoption.

10.1109/mm.2020.2974843 article EN IEEE Micro 2020-02-18

Fathom: reference workloads for modern deep learning methods

OPENALEX - Publications

Robert Adolf Saketh Rama Brandon Reagen Gu-Yeon Wei David Brooks

Deep learning has been popularized by its recent successes on challenging artificial intelligence problems. One of the reasons for dominance is also an ongoing challenge: need immense amounts computational power. Hardware architects have responded proposing a wide array promising ideas, but to date, majority work focused specific algorithms in somewhat narrow application domains. While their specificity does not diminish these approaches, there clear more flexible solutions. We believe first...

10.1109/iiswc.2016.7581275 preprint EN 2016-09-01

Ares: A framework for quantifying the resilience of deep neural networks

OPENALEX - Publications

Brandon Reagen Udit Gupta Lillian Pentecost Paul N. Whatmough Sae Kyu Lee and 3 more

As the use of deep neural networks continues to grow, so does fraction compute cycles devoted their execution. This has led CAD and architecture communities devote considerable attention building DNN hardware. Despite these efforts, fault tolerance DNNs generally been overlooked. paper is first conduct a large-scale, empirical study resilience. Motivated by inherent algorithmic resilience DNNs, we are interested in understanding relationship between rate model accuracy. To do so, present...

10.1109/dac.2018.8465834 article EN 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) 2018-06-01

DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference

OPENALEX - Publications

Udit Gupta Samuel Hsia Vikram Saraph Xiaodong Wang Brandon Reagen and 4 more

Neural personalized recommendation is the cornerstone of a wide collection cloud services and products, constituting significant compute demand infrastructure. Thus, improving execution efficiency directly translates into infrastructure capacity saving. In this paper, we propose DeepRecSched, inference scheduler that maximizes latency-bounded throughput by taking account characteristics query size arrival patterns, model architectures, underlying hardware systems. By carefully optimizing...

10.1109/isca45697.2020.00084 preprint EN 2020-05-01

Chasing Carbon: The Elusive Environmental Footprint of Computing

OPENALEX - Publications

Udit Gupta Young Geun Kim Sylvia Lee Jordan Tse Hsien-Hsin S. Lee and 3 more

Given recent algorithm, software, and hardware innovation, computing has enabled a plethora of new applications. As becomes increasingly ubiquitous, however, so does its environmental impact. This paper brings the issue to attention computer-systems researchers. Our analysis, built on industry-reported characterization, quantifies effects in terms carbon emissions. Broadly, emissions have two sources: operational energy consumption, manufacturing infrastructure. Although from former are...

10.1109/hpca51647.2021.00076 article EN 2021-02-01

Cheetah: Optimizing and Accelerating Homomorphic Encryption for Private Inference

OPENALEX - Publications

Brandon Reagen Wooseok Choi Yeongil Ko Vincent T. Lee Hsien-Hsin S. Lee and 2 more

As the application of deep learning continues to grow, so does amount data used make predictions. While traditionally big-data was constrained by computing performance and off-chip memory bandwidth, a new constraint has emerged: privacy. One solution is homomorphic encryption (HE). Applying HE client-cloud model allows cloud services perform inferences directly on clients' encrypted data. can meet privacy constraints it introduces enormous computational challenges remains impractically slow...

10.1109/hpca51647.2021.00013 article EN 2021-02-01

Quantifying sources of error in McPAT and potential impacts on architectural studies

OPENALEX - Publications

Sam Likun Xi Hans Jacobson Pradip Bose Gu-Yeon Wei David Brooks

Architectural power modeling tools are widely used by the computer architecture community for rapid evaluations of high-level design choices and space explorations. Currently, McPAT [31] is de facto model, but literature does not yet contain a careful examination its accuracy. In addition, issue how greatly error can affect architectural-level studies has been quantified before. this work, we present first rigorous assessment McPAT's core area models with detailed, validated toolchain in...

10.1109/hpca.2015.7056064 article EN 2015-02-01

Co-designing accelerators and SoC interfaces using gem5-Aladdin

OPENALEX - Publications

Yakun Sophia Shao Sam Likun Xi Vijayalakshmi Srinivasan Gu-Yeon Wei David Brooks

Increasing demand for power-efficient, high-performance computing has spurred a growing number and diversity of hardware accelerators in mobile server Systems on Chip (SoCs). This paper makes the case that co-design accelerator microarchitecture with system which it belongs is critical to balanced, efficient microarchitectures. We find data movement coherence management are significant yet often unaccounted components total runtime, resulting misleading performance predictions inefficient...

10.1109/micro.2016.7783751 article EN 2016-10-01