NFDI4DS | UHH-SEMS - Publication Details

Graphite: A distributed parallel simulator for multicores

OPENALEX - Publications

Jason Miller Harshad Kasture George Thomas Kurian Charles Gruenwald Nathan Beckmann and 3 more

This paper introduces the Graphite open-source distributed parallel multicore simulator infrastructure. is designed from ground up for exploration of future multi-core processors containing dozens, hundreds, or even thousands cores. It provides high performance fast design space and software development. Several techniques are used to achieve this including: direct execution, seamless multi-machine distribution, lax synchronization. capable accelerating simulations by distributing them...

10.1109/hpca.2010.5416635 article EN 2010-01-01

Strober

OPENALEX - Publications

Donggyu Kim Adam Izraelevitz Christopher Celio Hokeun Kim Brian Zimmer and 3 more

This paper presents a sample-based energy simulation methodology that enables fast and accurate estimations of performance average power for arbitrary RTL designs. Our approach uses an FPGA to simultaneously simulate the design collect samples containing exact state snapshots. Each snapshot is then replayed in gate-level simulation, resulting workload-specific estimate with confidence intervals. For workloads, our guarantees minimum four-orders-of-magnitude speedup over commercial CAD tools...

10.1145/3007787.3001151 article EN ACM SIGARCH Computer Architecture News 2016-06-18

BROOM: An Open-Source Out-of-Order Processor With Resilient Low-Voltage Operation in 28-nm CMOS

OPENALEX - Publications

Christopher Celio Pi-Feng Chiu Krste Asanović Borivoje Nikolić David A. Patterson

The Berkeley resilient out-of-order machine (BROOM) is a resilient, wide-voltage-range implementation of an open-source (OoO) RISC-V processor implemented in ASIC flow. A 28-nm test-chip contains BOOM OoO core and 1-MiB level-2 (L2) cache, enhanced with architectural error tolerance for low-voltage operation. It was by using agile design methodology, where the initial architecture transformed to perform well high-performance, low-leakage CMOS process, informed synthesis, place, route data...

10.1109/mm.2019.2897782 article EN IEEE Micro 2019-02-05

DESSERT: Debugging RTL Effectively with State Snapshotting for Error Replays across Trillions of Cycles

OPENALEX - Publications

Donggyu Kim Christopher Celio Sagar Karandikar David Biancolin Jonathan Bachrach and 1 more

We present DESSERT, an FPGA-accelerated methodology for simulation-based RTL verification. The design is automatically transformed and instrumented to allow deterministic simulation on the FPGA with initialization state snapshot capture. Assert statements, which are in error checking software simulation, synthesized quick hardware-based checking. Print statements also generate logs from FPGA, compared fly against a functional golden-model simulator more exhaustive To rapidly provide...

10.1109/fpl.2018.00021 article EN 2018-08-01

The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V

OPENALEX - Publications

Christopher Celio Palmer Dabbelt David A. Patterson Krste Asanović

This report makes the case that a well-designed Reduced Instruction Set Computer (RISC) can match, and even exceed, performance code density of existing commercial Complex Computers (CISC) while maintaining simplicity cost-effectiveness underpins original RISC goals. We begin by comparing dynamic instruction counts bytes fetched for popular proprietary ARMv7, ARMv8, IA-32, x86-64 Architectures (ISAs) against free open RISC-V RV64G RV64GC ISAs when running SPEC CINT2006 benchmark suite. was...

10.48550/arxiv.1607.02318 preprint EN other-oa arXiv (Cornell University) 2016-01-01

An Out-of-Order RISC-V Processor with Resilient Low-Voltage Operation in 28NM CMOS

OPENALEX - Publications

Pi-Feng Chiu Christopher Celio Krste Asanović David A. Patterson Borivoje Nikolić

An open-source out-of-order superscalar processor implements the 64-bit RISC-V instruction set architecture (ISA) and achieves 3.77 CoreMark/MHz. The 2.7 mm×1.8 mm chip includes one core operating at 1.0 GHz nominal 0.9 V with 1 MB of level-2 (L2) cache in a 28 nm HPM process. A line recycling (LR) technique reuses faulty lines that fail low voltages to correct errors only 0.77% L2 area overhead. LR reduces minimum voltage 0.47 V, improving energy efficiency by 43% negligible impact on CPI.

10.1109/vlsic.2018.8502320 article EN 2018-06-01

Strober: Fast and Accurate Sample-Based Energy Simulation for Arbitrary RTL

OPENALEX - Publications

Donggyu Kim Adam Izraelevitz Christopher Celio Hokeun Kim Brian Zimmer and 3 more

This paper presents a sample-based energy simulation methodology that enables fast and accurate estimations of performance average power for arbitrary RTL designs. Our approach uses an FPGA to simultaneously simulate the design collect samples containing exact state snapshots. Each snapshot is then replayed in gate-level simulation, resulting workload-specific estimate with confidence intervals. For workloads, our guarantees minimum four-orders-of-magnitude speedup over commercial CAD tools...

10.1109/isca.2016.21 article EN 2016-06-01

Cache Resiliency Techniques for a Low-Voltage RISC-V Out-of-Order Processor in 28-nm CMOS

OPENALEX - Publications

Pi-Feng Chiu Christopher Celio Krste Asanović Borivoje Nikolić David A. Patterson

Architecture-level assist techniques enable low-voltage operation by tolerating errors in SRAM-based caches. A line recycling (LR) technique is proposed to reuse faulty cache lines that fail at low voltages correct with only 0.77% level-2 (L2) area overhead. LR can either save 33% of capacity loss from disable or allow further reduction minimum operating voltage (Vmin). Bit bypass implemented SRAM extends the tag array log error entries providing multibit-error protection for metadata...

10.1109/lssc.2019.2900148 article EN IEEE Solid-State Circuits Letters 2018-12-01