Kimiyoshi Usami

ORCID: 0000-0002-8911-3313
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Low-power high-performance VLSI design
  • Parallel Computing and Optimization Techniques
  • Embedded Systems Design Techniques
  • Advancements in Semiconductor Devices and Circuit Design
  • VLSI and FPGA Design Techniques
  • Semiconductor materials and devices
  • Interconnection Networks and Systems
  • Advanced Memory and Neural Computing
  • Analog and Mixed-Signal Circuit Design
  • VLSI and Analog Circuit Testing
  • 3D IC and TSV technologies
  • Ferroelectric and Negative Capacitance Devices
  • Integrated Circuits and Semiconductor Failure Analysis
  • Electromagnetic Compatibility and Noise Suppression
  • Advanced DC-DC Converters
  • CCD and CMOS Imaging Sensors
  • Advanced Battery Technologies Research
  • Advanced Data Storage Technologies
  • Quantum-Dot Cellular Automata
  • Electrostatic Discharge in Electronics
  • Silicon Carbide Semiconductor Technologies
  • Advancements in PLL and VCO Technologies
  • Neural Networks and Applications
  • Physical Unclonable Functions (PUFs) and Hardware Security
  • Energy Harvesting in Wireless Networks

Shibaura Institute of Technology
2015-2024

Waseda University
2013-2015

Photonic Systems (United States)
2013-2015

National Research Foundation
2013

IEEE Computer Society
2013

Keio University
2008-2011

Toshiba (Japan)
1988-2002

Toshiba (South Korea)
1987-2002

Stanford University
1995

Article Free Access Share on Clustered voltage scaling technique for low-power design Authors: Kimiyoshi Usami Toshiba Corp., 580-1, Horikawa-cho, Saiwai-ku, Kawasaki, Japan and Stanford University, Stanford, CA CAView Profile , Mark Horowitz Authors Info & Claims ISLPED '95: Proceedings of the 1995 international symposium Low power designApril Pages 3–8https://doi.org/10.1145/224081.224083Online:23 April 1995Publication History 338citation1,972DownloadsMetricsTotal Citations338Total...

10.1145/224081.224083 article EN 1995-01-01

This paper describes an automated design technique to reduce power by making use of two supply voltages. The consists structure synthesis, placement, and routing. synthesizer clusters the gates off critical paths so as reduced voltage save power. placement routing tool assigns either or unreduced one each row minimize area overhead. supply, is also exploited in a clock tree Combining these techniques together, we applied it media processor chip. combined 47% random-logic modules 73% tree,...

10.1109/4.661212 article EN IEEE Journal of Solid-State Circuits 1998-03-01

A 60-mW MPEG4 video codec has been developed for mobile multimedia applications. This supports both the H.263 ITU-T recommendation and simple profile of committee draft version 1 released in November 1997. It is composed a 16-bit reduced instruction set computer processor several dedicated hardware engines so as to satisfy power efficiency programmability. performs 10 frames/s encoding decoding with quarter-common intermediate format at 30 MHz. Several innovative low-power techniques were...

10.1109/4.726575 article EN IEEE Journal of Solid-State Circuits 1998-01-01

A novel design technique which combines a variable supply-voltage scheme and clustered voltage scaling is presented (VS-CVS scheme). theory to choose the optimum supply voltages in VS-CVS discussed enables us perform chip top-down fashion. Level-shifting flip-flops are developed reduce power, delay area penalties significantly. Application of this an MPEG4 video codec saves 55% power dissipation without degrading circuit performance compared conventional CMOS design.

10.1109/cicc.1998.695026 article EN 2002-11-27

Leakage power dissipation becomes a dominant component in operation nanometer devices. This paper describes design methodology to implement runtime gating fine-grained manner. We propose an approach use sleep signals that are not off-chip but extracted locally within the design. By utilizing enable gated clock design, we automatically partition into domains. then choose domains will achieve gain energy savings by considering dynamic overhead due turning on/off switches. To help this decision...

10.1109/iccd.2006.4380809 article EN Proceedings, IEEE International Conference on Computer Design/Proceedings - IEEE International Conference on Computer Design 2006-10-01

This paper proposes an ultra fine-grained run-time power gating of on-chip router, in which supply to each router component (e.g., VC queue, crossbar MUX, and output latch) can be individually controlled response the applied workload. As only components are just transferring a packet activated, leakage network reduced near-optimal level. However, certain amount wakeup latency is required activate sleeping components, application performance will degraded. In this paper, we estimate for based...

10.1109/nocs.2010.16 article EN 2010-01-01

We have developed a function-level power estimation methodology for predicting the dissipation of embedded software. For given microprocessor core, we empirically build "power data bank", which stores information built-in library functions and basic instructions. To estimate average an software on this first get execution target from program profiling/tracing tools. Then evaluate total energy consumption time based take their ratio as power. High efficiency is achieved because no simulator...

10.1145/337292.337786 article EN Proceedings of the 40th conference on Design automation - DAC '03 2000-01-01

Cool Mega-Array (CMA) is an energy-efficient reconfigurable accelerator for battery-driven mobile devices. It has a large processing-element array without memory elements mapping application's data-flow graph, simple programmable microcontroller data management, and memory. Unlike coarse-grained dynamically processors, CMA reduces power consumption by switching hardware context storing intermediate in registers.

10.1109/mm.2011.94 article EN IEEE Micro 2011-10-27

We present a low-power design method that utilizes the multiple supply voltages. The proposed reduces power consumption of random logic circuits by 47% on average, with up to 15% area overhead, combination Clustered Voltage Scaling (CVS) scheme and Row optimized Power Supply (RRPS) scheme. By CVS scheme, optimal netlist, uses minimized number level converters maximized low Vdd gates under timing constraints, is generated. To avoid wiring resource increase interconnect delay caused layout...

10.1145/263272.263279 article EN 1997-01-01

This paper describes an automated design technique to selectively use multi-threshold CMOS (MTCMOS) in a cell-by-cell fashion. MT cells consisting of low-Vth transistors and high-Vth sleep are assigned critical paths, while non-critical paths. Compared the conventional MTCMOS, gate delay is not affected by discharge patterns other gates because there no virtual ground be shared. We applied this test chip DSP core. The worst path-delay was improved 14% over single without increasing standby...

10.1145/566408.566458 article EN 2002-01-01

This paper describes an automated design technique to reduce power by making use of two supply voltages. The consists structure synthesis, placement and routing. synthesizer clusters the gates off critical paths so as reduced voltage save power. routing tool assigns either or unreduced one each row minimize area overhead. Combining these techniques together, we applied it random logic modules a media processor chip. combined 47% on average with overhead 15% at logic, while keeping performance,.

10.1109/cicc.1997.606600 article EN 2002-11-22

This paper describes a fully automated low-power design methodology in which three different voltage-scaling techniques are combined together. Supply voltage is scaled globally, selectively, and adaptively while keeping the performance. enabled us to an MPEG4 codec core with 58% less power than original week turn-around-time.

10.1145/277044.277178 article EN 1998-01-01

A fine-grain dynamic power gating is proposed for saving the leakage in MIPS R3000 by sleep control and applied to a processor pipeline. An execution unit divided into four small units: multiplier, divider, shifter other (CLU). The of each cut off dynamically, based on operation. We tape-outed prototype chip Geyser-0, which provides an Core with reduction technique, 16 KB caches translation lookaside buffer (TLB) using 90 nm CMOS technology. evaluation results benchmark programs embedded...

10.1109/iccd.2008.4751924 article EN 2008-10-01

Geyser-1, a prototype MIPS R3000 CPU with fine grain runtime PG for major computational components in the execution stage is available. Function units such as CLU, shifter, multiplier and divider are power-gated controlled at that only function unit to be used powered-on minimize leakage power. The evaluation results on real chip reveals mechanism works without electric problems. It reduces power 7% 25°C 24% 80°C. using benchmark programs show consumption can reduced from 3% 30%

10.1109/asscc.2009.5357257 article EN 2009-11-01

This paper proposes the ultrafine-grained run-time power gating of on-chip routers, in which supply to each router component (e.g., virtual-channel buffer, multiplexer, and crossbar multiplexer output latch) can be individually controlled based on applied workload. Since only components that are transferring a packet activated, leakage network reduced near-optimal level. However, such techniques inherently increase communication latency degrade application performance, since certain amount...

10.1109/tcad.2011.2110470 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2011-03-22

The authors developed a scalable heterogeneous multicore processor. 3D chip stacking of general-purpose CPU and reconfigurable accelerators enables various trade-offs between performance energy consumption. stacked chips interconnect through network on (NoC). By simply changing the number accelerator chips, processor parallelism can be widely scaled. No design change is needed, hence, no additional nonrecurring engineering (NRE) cost required. An inductive-coupling ThruChip Interface (TCI)...

10.1109/mm.2013.112 article EN IEEE Micro 2013-11-01

Article Free Access Share on Low-power design methodology and applications utilizing dual supply voltages Authors: Kimiyoshi Usami Design Methodology Department, System LSI Division, Toshiba Corporation, Semiconductor Company, 580-1, Horikawa-cho, Saiwai-ku, Kawasaki 210-8520, Japan JapanView Profile , Mutsunori Igarashi Authors Info & Claims ASP-DAC '00: Proceedings of the 2000 Asia South Pacific Automation ConferenceJanuary Pages 123–128https://doi.org/10.1145/368434.368590Online:28...

10.1145/368434.368590 article EN 2000-01-01

This paper describes a design and implementation methodology for fine-grain power gating. Since sleep-in wakeup are controlled in fine granularity run time, shortening the transition time between sleep active states is strongly required. In particular, essential because it affects execution hence does performance. However, this requirement makes suppression of ground-bounce more difficult. We propose novel technique to skew timings local domains suppress ground bounce. Delay buffers driving...

10.1109/vlsi.design.2009.63 article EN 2009-01-01

A 32-bit CPU which operates with the lowest energy of 13.4 pJ/cycle at 0.35V and 14MHz, 0.22V to 1.2V 0.14μA sleep current is demonstrated. The low power performance attained by Reverse-Body-Bias-Assisted 65nm SOTB CMOS (Silicon On Thin Buried oxide) technology. can operate more than 100 years 610mAH Li battery.

10.1109/coolchips.2014.6842954 article EN 2014-04-01

A 32bit CPU, which can operate more than 15 years with 220mAH Li battery, or eternally an energy harvester of in-door light is presented. The CPU was fabricated by using 65nm SOTB CMOS technology (Silicon on Thin Buried oxide) where gate length 60nm and BOX layer thickness 10nm. threshold voltage designed to be as low 0.19V so that the operates at over region, even lower supply voltages down 0.22V. Large reverse body bias up -2.5V applied bodies devices without increasing induced drain leak...

10.1587/transele.e98.c.536 article EN IEICE Transactions on Electronics 2015-01-01

We propose a multi-voltage (multi-Vdd) variable pipeline router to reduce the power consumption of Network-on-Chips (NoCs) designed for chip multi-processors (CMPs). Our multi-Vdd adjusts its depth (i.e., communication latency) and supply voltage level in response applied workload. Unlike dynamic frequency scaling (DVFS) routers, operating is same all routers throughout CMP; thus, there no need synchronize neighboring working at different frequencies. In this paper, we implemented router,...

10.1109/aspdac.2012.6164982 article EN 2012-01-01

In this brief, a practical power optimization method that calculates the optimal supply and body bias voltages, for given target operational frequency temperature, is proposed evaluated. The based upon simple model in which several coefficients leakage power, switching are obtained from accurate real chip measurements. calculated optimal-voltage settings by can achieve minimum accuracies of 93.8%, 91.6%, 79.5% room-temperature, 50 °C, 65 respectively. Since methodology on well-known...

10.1109/tvlsi.2016.2635675 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2016-12-30

While the spin-transfer torque (STT) magnetic tunnel junction (MTJ) is a promising technique for enabling nonvolatile flip-flops (NVFFs) to perform power gating reduce leakage without any data losses, large store energy (the make operation) of MTJs needs be addressed. The cool mega array series an edge-oriented coarse-grained reconfigurable accelerator that implements improved MTJ-based NVFF with verify-and-retryable method should ideally under presence switching time variation originating...

10.1109/tvlsi.2023.3237794 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2023-01-30

This paper describes an automated layout design technique for the gated-clock design. Two issues must be considered circuits to work correctly. One is minimize skew nets. The other keep timing constraints enable-logic parts. We propose taking these things into consideration. developed tree synthesizer first issue, and generator clock delay estimator second. applied it a practical circuit. By our technique, clock-skew could less than 0.2 ns keeping

10.1109/aspdac.1998.669476 article EN 2002-11-27

One of the benefits coarse grained dynamically reconfigurable processor array(DRPA) is its low dynamic power consumption by operating a number processing elements(PE) in parallel with clock frequency. However, future advanced processes, leakage will occupy considerable part total consumption, and it may degrade advantage DRPAs. In order to reduce power, fine Power Gating(PG) applied DRPA, MuCCRA-2.32b, area overhead are measured. We evaluated effect two control modes; Pair Unit Individual...

10.1109/fpt.2008.4762410 article EN 2008-12-01
Coming Soon ...