Gu-Yeon Wei

ORCID: 0000-0001-5730-9904
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Low-power high-performance VLSI design
  • Embedded Systems Design Techniques
  • Advanced Memory and Neural Computing
  • Advanced Neural Network Applications
  • Advancements in PLL and VCO Technologies
  • Ferroelectric and Negative Capacitance Devices
  • Analog and Mixed-Signal Circuit Design
  • Interconnection Networks and Systems
  • Semiconductor materials and devices
  • VLSI and Analog Circuit Testing
  • Advanced Data Storage Technologies
  • Radiation Effects in Electronics
  • Cloud Computing and Resource Management
  • CCD and CMOS Imaging Sensors
  • Stochastic Gradient Optimization Techniques
  • Biomimetic flight and propulsion mechanisms
  • Radio Frequency Integrated Circuit Design
  • Advancements in Semiconductor Devices and Circuit Design
  • Energy Efficient Wireless Sensor Networks
  • Recommender Systems and Techniques
  • Privacy-Preserving Technologies in Data
  • Green IT and Sustainability
  • Innovative Energy Harvesting Technologies
  • Energy Harvesting in Wireless Networks

Harvard University Press
2016-2025

Harvard University
2014-2024

Arizona State University
2023

Samsung (South Korea)
2020

Samsung (United States)
2019

Nvidia (United Kingdom)
2017

University of Cambridge
2014

Massachusetts Institute of Technology
2007

Oregon State University
2006

Stanford University
1996-2002

Portable, embedded systems place ever-increasing demands on high-performance, low-power microprocessor design. Dynamic voltage and frequency scaling (DVFS) is a well-known technique to reduce energy in digital systems, but the effectiveness of DVFS hampered by slow transitions that occur order tens microseconds. In addition, recent trend towards chip-multiprocessors (CMP) executing multi-threaded workloads with heterogeneous behavior motivates need for per-core control mechanisms. Voltage...

10.1109/hpca.2008.4658633 article EN Proceedings - International Symposium on High-Performance Computer Architecture/Proceedings 2008-02-01

Process variations will greatly impact the stability, leakage power consumption, and performance of future microprocessors. These are especially detrimental to 6T SRAM (6-transistor static memory) structures become critical with continued technology scaling. In this paper, we propose new on-chip memory architectures based on novel 3T1D DRAM (3-transistor, 1-diode dynamic cells. We provide a detailed comparison between designs in context L1 data cache. The effects physical device variation...

10.1109/micro.2007.40 article EN 2007-01-01

The continued success of Deep Neural Networks (DNNs) in classification tasks has sparked a trend accelerating their execution with specialized hardware. While published designs easily give an order magnitude improvement over general-purpose hardware, few look beyond initial implementation. This paper presents Minerva, highly automated co-design approach across the algorithm, architecture, and circuit levels to optimize DNN hardware accelerators. Compared established fixed-point accelerator...

10.1145/3007787.3001165 article EN ACM SIGARCH Computer Architecture News 2016-06-18

With the increasing prevalence of warehouse-scale (WSC) and cloud computing, understanding interactions server applications with underlying microarchitecture becomes ever more important in order to extract maximum performance out hardware. To aid such understanding, this paper presents a detailed microarchitectural analysis live datacenter jobs, measured on than 20,000 Google machines over three year period, comprising thousands different applications.

10.1145/2749469.2750392 article EN 2015-05-26

On-chip DC-DC converters have the potential to offer fine-grain power management in modern chip-multiprocessors. This paper presents a fully integrated 3-level converter, hybrid of buck and switched-capacitor converters, implemented 130 nm CMOS technology. The converter enables smaller inductors (1 nH) than buck, while generating wide range output voltages compared 1/2 mode converter. test-chip prototype delivers up 0.85 A load current from 0.4 1.4 V 2.4 input supply. It achieves 77% peak...

10.1109/jssc.2011.2169309 article EN IEEE Journal of Solid-State Circuits 2011-11-18

Recent high-level synthesis and accelerator-related architecture papers show a great disparity in workload selection. To improve standardization within the accelerator research community, we present MachSuite, collection of 19 benchmarks for evaluating tools accelerator-centric architectures. MachSuite spans broad application space, captures variety different program behaviors, provides implementations tailored towards needs designers researchers, including support synthesis. We illustrate...

10.1109/iiswc.2014.6983050 article EN 2014-10-01

Hardware specialization, in the form of accelerators that provide custom datapath and control for specific algorithms applications, promises impressive performance energy advantages compared to traditional architectures. Current research accelerator analysis relies on RTL-based synthesis flows produce accurate timing, power, area estimates. Such techniques not only require significant effort expertise but are also slow tedious use, making large design space exploration infeasible. To...

10.1145/2678373.2665689 article EN ACM SIGARCH Computer Architecture News 2014-06-14

The continued success of Deep Neural Networks (DNNs) in classification tasks has sparked a trend accelerating their execution with specialized hardware. While published designs easily give an order magnitude improvement over general-purpose hardware, few look beyond initial implementation. This paper presents Minerva, highly automated co-design approach across the algorithm, architecture, and circuit levels to optimize DNN hardware accelerators. Compared established fixed-point accelerator...

10.1109/isca.2016.32 article EN 2016-06-01

As the use of deep neural networks continues to grow, so does fraction compute cycles devoted their execution. This has led CAD and architecture communities devote considerable attention building DNN hardware. Despite these efforts, fault tolerance DNNs generally been overlooked. paper is first conduct a large-scale, empirical study resilience. Motivated by inherent algorithmic resilience DNNs, we are interested in understanding relationship between rate model accuracy. To do so, present...

10.1145/3195970.3195997 article EN 2018-06-19

Training deep learning models is compute-intensive and there an industry-wide trend towards hardware specialization to improve performance. To systematically benchmark platforms, we introduce ParaDnn, a parameterized suite for that generates end-to-end fully connected (FC), convolutional (CNN), recurrent (RNN) neural networks. Along with six real-world models, Google's Cloud TPU v2/v3, NVIDIA's V100 GPU, Intel Skylake CPU platform. We take dive into architecture, reveal its bottlenecks,...

10.48550/arxiv.1907.10701 preprint EN cc-by-sa arXiv (Cornell University) 2019-01-01

Machine learning (ML) needs industry-standard performance benchmarks to support design and competitive evaluation of the many emerging software hardware solutions for ML. But ML training presents three unique benchmarking challenges absent from other domains: optimizations that improve throughput can increase time solution, is stochastic solution exhibits high variance, systems are so diverse fair with same binary, code, even hyperparameters difficult. We therefore present MLPerf, an...

10.48550/arxiv.1910.01500 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Dynamic voltage and frequency scaling (DVFS) is a commonly-used power-management scheme that dynamically adjusts power performance to the time-varying needs of running programs. Unfortunately, conventional DVFS, relying on off-chip regulators, faces limitations in terms temporal granularity high costs when considered for future multi-core systems. To overcome these challenges, this paper presents thread motion (TM), fine-grained chip multiprocessors (CMPs). Instead incurring cost changing...

10.1145/1555754.1555793 article EN 2009-06-20

Recent efforts to address microprocessor power dissipation through aggressive supply voltage scaling and management require that designers be increasingly cognizant of variations. These variations, primarily due fast changes in current, can attributed architectural gating events reduce dissipation. In order study this problem, the authors propose a fine-grain, parameterizable model for power-delivery networks allows system localized, on-chip fluctuations high-performance microprocessors....

10.5555/1266366.1266498 article EN 2007-04-16

This paper presents a 28nm SoC with programmable FC-DNN accelerator design that demonstrates: (1) HW support to exploit data sparsity by eliding unnecessary computations (4× energy reduction); (2) improved algorithmic error tolerance using sign-magnitude number format for weights and datapath computation; (3) circuit-level timing violation in logic via timeborrowing; (4) combined circuit resilience Razor detection reduce VDD scaling or increase throughput FCLK scaling; (5) high...

10.1109/isscc.2017.7870351 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2017-02-01

Recent efforts to address microprocessor power dissipation through aggressive supply voltage scaling and management require that designers be increasingly cognizant of variations. These variations, primarily due fast changes in current, can attributed architectural gating events reduce dissipation. In order study this problem, the authors propose a fine-grain, parameterizable model for power-delivery networks allows system localized, on-chip fluctuations high-performance microprocessors....

10.1109/date.2007.364663 article EN 2007-04-01

In this article, we describe the design choices behind MLPerf, a machine learning performance benchmark that has become an industry standard. The first two rounds of MLPerf Training helped drive improvements to software-stack and scalability, showing 1.3× speedup in top 16-chip results despite higher quality targets 5.5× increase system scale. round Inference received over 500 from 14 different organizations, growing adoption.

10.1109/mm.2020.2974843 article EN IEEE Micro 2020-02-18

Deep learning has been popularized by its recent successes on challenging artificial intelligence problems. One of the reasons for dominance is also an ongoing challenge: need immense amounts computational power. Hardware architects have responded proposing a wide array promising ideas, but to date, majority work focused specific algorithms in somewhat narrow application domains. While their specificity does not diminish these approaches, there clear more flexible solutions. We believe first...

10.1109/iiswc.2016.7581275 preprint EN 2016-09-01

As the use of deep neural networks continues to grow, so does fraction compute cycles devoted their execution. This has led CAD and architecture communities devote considerable attention building DNN hardware. Despite these efforts, fault tolerance DNNs generally been overlooked. paper is first conduct a large-scale, empirical study resilience. Motivated by inherent algorithmic resilience DNNs, we are interested in understanding relationship between rate model accuracy. To do so, present...

10.1109/dac.2018.8465834 article EN 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) 2018-06-01

Neural personalized recommendation is the cornerstone of a wide collection cloud services and products, constituting significant compute demand infrastructure. Thus, improving execution efficiency directly translates into infrastructure capacity saving. In this paper, we propose DeepRecSched, inference scheduler that maximizes latency-bounded throughput by taking account characteristics query size arrival patterns, model architectures, underlying hardware systems. By carefully optimizing...

10.1109/isca45697.2020.00084 preprint EN 2020-05-01

Given recent algorithm, software, and hardware innovation, computing has enabled a plethora of new applications. As becomes increasingly ubiquitous, however, so does its environmental impact. This paper brings the issue to attention computer-systems researchers. Our analysis, built on industry-reported characterization, quantifies effects in terms carbon emissions. Broadly, emissions have two sources: operational energy consumption, manufacturing infrastructure. Although from former are...

10.1109/hpca51647.2021.00076 article EN 2021-02-01

As the application of deep learning continues to grow, so does amount data used make predictions. While traditionally big-data was constrained by computing performance and off-chip memory bandwidth, a new constraint has emerged: privacy. One solution is homomorphic encryption (HE). Applying HE client-cloud model allows cloud services perform inferences directly on clients' encrypted data. can meet privacy constraints it introduces enormous computational challenges remains impractically slow...

10.1109/hpca51647.2021.00013 article EN 2021-02-01

Architectural power modeling tools are widely used by the computer architecture community for rapid evaluations of high-level design choices and space explorations. Currently, McPAT [31] is de facto model, but literature does not yet contain a careful examination its accuracy. In addition, issue how greatly error can affect architectural-level studies has been quantified before. this work, we present first rigorous assessment McPAT's core area models with detailed, validated toolchain in...

10.1109/hpca.2015.7056064 article EN 2015-02-01

Increasing demand for power-efficient, high-performance computing has spurred a growing number and diversity of hardware accelerators in mobile server Systems on Chip (SoCs). This paper makes the case that co-design accelerator microarchitecture with system which it belongs is critical to balanced, efficient microarchitectures. We find data movement coherence management are significant yet often unaccounted components total runtime, resulting misleading performance predictions inefficient...

10.1109/micro.2016.7783751 article EN 2016-10-01
Coming Soon ...