- Low-power high-performance VLSI design
- Advancements in Semiconductor Devices and Circuit Design
- Analog and Mixed-Signal Circuit Design
- Semiconductor materials and devices
- Radio Frequency Integrated Circuit Design
- VLSI and FPGA Design Techniques
- Parallel Computing and Optimization Techniques
- VLSI and Analog Circuit Testing
- Cooperative Communication and Network Coding
- Advancements in PLL and VCO Technologies
- Advanced Wireless Communication Techniques
- Embedded Systems Design Techniques
- Error Correcting Code Techniques
- Advanced Memory and Neural Computing
- Ferroelectric and Negative Capacitance Devices
- Radiation Effects in Electronics
- Advanced MIMO Systems Optimization
- Integrated Circuits and Semiconductor Failure Analysis
- Millimeter-Wave Propagation and Modeling
- Microwave Engineering and Waveguides
- Advanced Data Storage Technologies
- Electromagnetic Compatibility and Noise Suppression
- Wireless Communication Security Techniques
- Algorithms and Data Compression
- Interconnection Networks and Systems
University of California, Berkeley
2016-2025
Berkeley College
2009-2024
University of California System
2012-2022
Associazione Medici Diabetologi
2021
Institut Supérieur d'Électronique de Paris
2016
Agilent Technologies (United States)
2013
Berkeley Systems (United States)
2004
University of California, Davis
1997-2003
Texas Instruments (United States)
2001-2003
Institute for Technology of Nuclear and other Mineral Raw Materials
2000
Design and experimental evaluation of a new sense-amplifier-based flip-flop (SAFF) is presented. It was found that the main speed bottleneck existing SAFF's cross-coupled set-reset (SR) latch in output stage. The uses stage topology significantly reduces delay improves driving capability. performance this verified by measurements on test chip implemented 0.18 /spl mu/m effective channel length CMOS. Demonstrated places it among fastest flip-flops used state-of-the-art processors. Measurement...
Continued improvement in computing efficiency requires functional specialization of hardware designs. Agile design methodologies have been proposed to alleviate the increased costs custom silicon architectures, but their practice thus far has accompanied with challenges integration and validation complex systems-on-a-chip (SoCs). We present Chipyard framework, an integrated SoC design, simulation, implementation environment for specialized compute systems. includes configurable, composable,...
We present FireSim, an open-source simulation platform that enables cycle-exact microarchitectural of large scale-out clusters by combining FPGA-accelerated silicon-proven RTL designs with a scalable, distributed network simulation. Unlike prior tools, FireSim runs on Amazon EC2 F1, public cloud FPGA platform, which greatly improves usability, provides elasticity, and lowers the cost large-scale FPGA-based experiments. describe design implementation show how it can provide sufficient...
DNN accelerators are often developed and evaluated in isolation without considering the cross-stack, system-level effects real-world environments. This makes it difficult to appreciate impact of Systemon-Chip (SoC) resource contention, OS overheads, programming-stack inefficiencies on overall performance/energy-efficiency. To address this challenge, we present Gemmini, an open-source, full-stack accelerator generator. Gemmini generates a wide design-space efficient ASIC from flexible...
A 1.8-V 14-b 12-MS/s pseudo-differential pipeline analog-to-digital converter (ADC) using a passive capacitor error-averaging technique and nested CMOS gain-boosting is described. The optimized for low-voltage low-power applications by applying an optimum stage-scaling algorithm at the architectural level opamp comparator sharing circuit level. Prototyped in 0.18-/spl mu/m 6M-1P process, this achieves peak signal-to-noise plus distortion ratio (SNDR) of 75.5 dB 103-dB spurious-free dynamic...
<para xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> The class of low-density parity-check (LDPC) codes is attractive, since such can be decoded using practical message-passing algorithms, and their performance known to approach the Shannon limits for suitably large block lengths. For intermediate lengths relevant in applications, however, many LDPC exhibit a so-called "error floor," corresponding significant flattening curve that relates...
We present an adaptive digital technique to calibrate pipelined analog-to-digital converters (ADCs). Rather than achieving linearity by adjustment of analog component values, the new approach infers errors from conversion results and applies postprocessing correct those results. The scheme proposed here draws close analogy channel equalization problem commonly encountered in communications. show that, with help a slow but accurate ADC, code-domain finite-impulse-response filter is sufficient...
This paper presents methods for efficient energy-performance optimization at the circuit and micro-architectural levels. The optimal balance between energy performance is achieved when sensitivity of to a change in equal all design variables. sensitivity-based optimizations minimize subject delay constraint. Energy savings about 65% can be without penalty with equalization sensitivities sizing, supply, threshold voltage 64-bit adder, compared reference sized minimum delay. Circuit effective...
This paper presents a power- and area-efficient 24-way time-interleaved successive-approximation-register (SAR) analog-to-digital converter (ADC) that achieves 2.8 GS/s 8.1 ENOB in 65 nm CMOS. To minimize the power area, capacitors capacitive DAC are sized to meet thermal noise requirements rather than matching requirements, leading LSB capacitance of 50 aF. An on-chip digital background calibration is used calibrate capacitor mismatches individual ADC channels, as well inter-channel offset,...
Increased process variability presents a major challenge for future SRAM scaling. Fast and accurate validation of read stability writeability margins is crucial estimating yield in large arrays. Conventional read/write metrics are characterized through test structures that able to provide limited hardware measurement data cannot be used investigate cell bit fails functional This work method large-scale characterization arrays using direct bit-line measurements. A chip implemented 45 nm CMOS...
Large arrays of radios have been exploited for beamforming and null steering in both radar communication applications, but cost form factor limitations precluded their use commercial systems. This paper discusses how to build that enable multiuser massive multiple-input-multiple-output (MIMO) aggressive spatial multiplexing with many users sharing the same spectrum. The focus is energy- cost-efficient realization these order new applications. Distributed algorithms are proposed, optimum...
<para xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> A grouped-parallel low-density parity-check (LDPC) decoder is designed for the (2048,1723) Reed-Solomon-based LDPC (RS-LDPC) code suitable 10GBASE-T Ethernet. two-step decoding scheme reduces wordlength to 4 bits while lowering error floor below 10<formula formulatype="inline"> <tex Notation="TeX">$^{-14}~$</tex></formula>BER. The proposed post-processor conveniently integrated with decoder,...
Domain specialization under energy constraints in deeply-scaled CMOS has been driving the need for agile development of Systems on a Chip (SoCs). While digital subsystems have design flows that are conducive to rapid iterations from specification layout, analog and mixed-signal modules face challenge long human-in-the-middle iteration loop requires expert intuition verify post-layout circuit parameters meet original specification. Existing automated solutions optimize given target...
The final phase of CMOS technology scaling provides continued increases in already vast transistor counts, but only minimal improvements energy efficiency, thus requiring innovation circuits and architectures. However, even huge teams are struggling to complete large, complex designs on schedule using traditional rigid development flows. This article presents an agile hardware methodology, which the authors adopted for 11 RISC-V microprocessor tape-outs modern 28-nm 45-nm processes past five...
We present BAG2, a framework for the development of process-portable Analog and Mixed Signal (AMS) circuit generators. Such generators are parametrized design procedures that produce schematics, layouts, verification testbenches given input specifications. This paper expands on previous work by introducing universal AMS into as well two new layout engines, XBase Laygo, enable have developed various complex driving examples, including time-interleaved SAR ADC SerDes transceiver frontend....
The design and experimental evaluation of a clocked adiabatic logic (GAL) is described in this paper. CAL dual-rail that operates from single-phase AC power-clock supply. This new low-energy makes it possible to integrate all power control circuitry on the chip, resulting better system efficiency, lower cost, simpler distribution. can also be operated DC supply nonenergy-recovery mode compatible with standard CMOS logic. In mode, waveform generated using an on-chip switching transistor small...
Intrinsic variations and challenging leakage control in today's bulk-Si MOSFETs limit the scaling of SRAM. Design tradeoffs six-transistor (6-T) four-transistor (4-T) SRAM cells are presented this work. It is found that 6-T 4-T FinFET-based designed with built-in feedback achieve significant improvements cell static noise margin (SNM) without area penalty. Up to 2x improvement SNM can be achieved cells. A sub-100pA per-cell standby current offer similar as feedback, making them attractive...
Dual-supply voltage design using a clustered scaling (CVS) scheme is an effective approach to reduce chip power. The optimal CVS relies on level converter implemented in flip-flop minimize energy, delay, and area penalties due conversion. Additionally, circuit robustness against supply bounce key property that differentiates good design. Novel flip-flops presented this paper incorporate half-latch precharged converter. These are optimized the energy-delay space achieve over 30% reduction of...
Two decoding schedules and the corresponding serialized architectures for low-density parity-check (LDPC) decoders are presented. They applied to codes with matrices generated either randomly or using geometric properties of elements in Galois fields. Both have low computational requirements. The original concurrent schedule has a large storage requirement that is dependent on total number edges underlying bipartite graph, while new, staggered which uses an approximation belief propagation,...
Many classes of high-performance low-density parity-check (LDPC) codes are based on parity check matrices composed permutation submatrices. We describe the design a parallel-serial decoder architecture that can be used to map any LDPC code with such structure hardware emulation platform. High-throughput allows for exploration low bit-error rate (BER) region and provides statistics error traces, which illuminate causes floors (2048, 1723) Reed-Solomon (RS-LDPC) (2209, 1978) array-based code....
<para xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> A methodology for energy–delay optimization of digital circuits is presented. This applied to minimizing the delay representative carry-lookahead adders under energy constraints. Impact various design choices, including tree structure and logic style, are analyzed in space verified through optimization. The result demonstrated on a fastest adder found, 240-ps Ling sparse domino 1 V, 90 nm CMOS....
The error-correcting performance of low-density parity check (LDPC) codes, when decoded using practical iterative decoding algorithms, is known to be close Shannon limits for codes with suitably large blocklengths. A substantial limitation the use finite-length LDPC presence an error floor in low frame rate (FER) region. This paper develops a deterministic method predicting floors, based on high signal-to-noise ratio (SNR) asymptotics, applied absorbing sets within structured codes. approach...
<para xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> A test-chip in a low-power 45 nm technology, featuring uniaxial strained-Si, has been built to study variability CMOS circuits. Systematic layout-induced variation, die-to-die (D2D), wafer-to-wafer (W2W) and within-die (WID) measured over multiple wafers, analyzed attributed likely causes the manufacturing process. Delay is characterized using an array of ring oscillators transistor leakage...