Christopher Celio

ORCID: 0000-0001-6384-6351
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Embedded Systems Design Techniques
  • Low-power high-performance VLSI design
  • Interconnection Networks and Systems
  • Radiation Effects in Electronics
  • Advanced Data Storage Technologies
  • Distributed and Parallel Computing Systems
  • Semiconductor materials and devices
  • Advanced Memory and Neural Computing
  • Quantum-Dot Cellular Automata
  • CCD and CMOS Imaging Sensors
  • Formal Methods in Verification

University of California, Berkeley
2016-2019

University of California System
2016

Massachusetts Institute of Technology
2010

This paper introduces the Graphite open-source distributed parallel multicore simulator infrastructure. is designed from ground up for exploration of future multi-core processors containing dozens, hundreds, or even thousands cores. It provides high performance fast design space and software development. Several techniques are used to achieve this including: direct execution, seamless multi-machine distribution, lax synchronization. capable accelerating simulations by distributing them...

10.1109/hpca.2010.5416635 article EN 2010-01-01

This paper presents a sample-based energy simulation methodology that enables fast and accurate estimations of performance average power for arbitrary RTL designs. Our approach uses an FPGA to simultaneously simulate the design collect samples containing exact state snapshots. Each snapshot is then replayed in gate-level simulation, resulting workload-specific estimate with confidence intervals. For workloads, our guarantees minimum four-orders-of-magnitude speedup over commercial CAD tools...

10.1145/3007787.3001151 article EN ACM SIGARCH Computer Architecture News 2016-06-18

The Berkeley resilient out-of-order machine (BROOM) is a resilient, wide-voltage-range implementation of an open-source (OoO) RISC-V processor implemented in ASIC flow. A 28-nm test-chip contains BOOM OoO core and 1-MiB level-2 (L2) cache, enhanced with architectural error tolerance for low-voltage operation. It was by using agile design methodology, where the initial architecture transformed to perform well high-performance, low-leakage CMOS process, informed synthesis, place, route data...

10.1109/mm.2019.2897782 article EN IEEE Micro 2019-02-05

We present DESSERT, an FPGA-accelerated methodology for simulation-based RTL verification. The design is automatically transformed and instrumented to allow deterministic simulation on the FPGA with initialization state snapshot capture. Assert statements, which are in error checking software simulation, synthesized quick hardware-based checking. Print statements also generate logs from FPGA, compared fly against a functional golden-model simulator more exhaustive To rapidly provide...

10.1109/fpl.2018.00021 article EN 2018-08-01

This report makes the case that a well-designed Reduced Instruction Set Computer (RISC) can match, and even exceed, performance code density of existing commercial Complex Computers (CISC) while maintaining simplicity cost-effectiveness underpins original RISC goals. We begin by comparing dynamic instruction counts bytes fetched for popular proprietary ARMv7, ARMv8, IA-32, x86-64 Architectures (ISAs) against free open RISC-V RV64G RV64GC ISAs when running SPEC CINT2006 benchmark suite. was...

10.48550/arxiv.1607.02318 preprint EN other-oa arXiv (Cornell University) 2016-01-01

An open-source out-of-order superscalar processor implements the 64-bit RISC-V instruction set architecture (ISA) and achieves 3.77 CoreMark/MHz. The 2.7 mm×1.8 mm chip includes one core operating at 1.0 GHz nominal 0.9 V with 1 MB of level-2 (L2) cache in a 28 nm HPM process. A line recycling (LR) technique reuses faulty lines that fail low voltages to correct errors only 0.77% L2 area overhead. LR reduces minimum voltage 0.47 V, improving energy efficiency by 43% negligible impact on CPI.

10.1109/vlsic.2018.8502320 article EN 2018-06-01

This paper presents a sample-based energy simulation methodology that enables fast and accurate estimations of performance average power for arbitrary RTL designs. Our approach uses an FPGA to simultaneously simulate the design collect samples containing exact state snapshots. Each snapshot is then replayed in gate-level simulation, resulting workload-specific estimate with confidence intervals. For workloads, our guarantees minimum four-orders-of-magnitude speedup over commercial CAD tools...

10.1109/isca.2016.21 article EN 2016-06-01

Architecture-level assist techniques enable low-voltage operation by tolerating errors in SRAM-based caches. A line recycling (LR) technique is proposed to reuse faulty cache lines that fail at low voltages correct with only 0.77% level-2 (L2) area overhead. LR can either save 33% of capacity loss from disable or allow further reduction minimum operating voltage (Vmin). Bit bypass implemented SRAM extends the tag array log error entries providing multibit-error protection for metadata...

10.1109/lssc.2019.2900148 article EN IEEE Solid-State Circuits Letters 2018-12-01
Coming Soon ...