- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Interconnection Networks and Systems
- Low-power high-performance VLSI design
- Ferroelectric and Negative Capacitance Devices
- Advanced Memory and Neural Computing
- CCD and CMOS Imaging Sensors
- Advancements in Semiconductor Devices and Circuit Design
- Photonic and Optical Devices
- Embedded Systems Design Techniques
- Analog and Mixed-Signal Circuit Design
- VLSI and FPGA Design Techniques
- graph theory and CDMA systems
- Advanced MEMS and NEMS Technologies
- Coding theory and cryptography
- Quantum Computing Algorithms and Architecture
- Mechanical and Optical Resonators
- Cloud Computing and Resource Management
- Phase-change materials and chalcogenides
- Semiconductor materials and devices
- Cryptographic Implementations and Security
- VLSI and Analog Circuit Testing
SK Group (South Korea)
2018-2021
Korea Advanced Institute of Science and Technology
2012-2018
With rapid development of micro-processors, off-chip memory access becomes a system bottleneck. DRAM, main in most computers, has concentrated only on capacity and bandwidth for decades to achieve high performance computing. However, DRAM latency should also be considered keep the trend multi-core era. Therefore, we propose NUAT which is new controller focusing reducing without any modification existing structure. We exploit DRAM's intrinsic phenomenon: electric charge variation cell...
Several previous works have changed DRAM bank structure to reduce memory access latency and shown performance improvement. However, changes in the area-optimized can incur large area-overhead. To solve this problem, we propose Multiple Clone Row (MCR-DRAM), which uses existing without any modification.
As DRAM data bandwidth increases, tremendous energy is dissipated in the bus. To reduce consumed bus, interfaces with asymmetric termination, such as Pseudo Open Drain (POD) and Low Voltage Swing Terminated Logic (LVSTL), have been adopted modern DRAMs. In using amount of termination proportional to hamming weight words. this work, we propose Bitwise Difference Encoding (BD-Encoding), which decreases words, leading a reduction consumption Since smaller words also reduces switching activity,...
As DRAM data bandwidth increases, tremendous energy is dissipated in the bus. To reduce consumed bus, interfaces with symmetric termination, such as Pseudo Open Drain (POD) and Low Voltage Swing Terminated Logic (LVSTL), have been adopted modern DRAMs. In using asymmetric amount of termination proportional to hamming weight words. this work, we propose Bitwise Difference Encoding (BD-Encoding), which decreases words, leading a reduction consumption Since smaller words also reduces switching...
Technology scaling and many-core design trends demand detailed information regarding the spatial temperature distribution, which is essential for dynamic thermal management [1,2]. The number of on-chip sensors in high-performance processors increasing, with state-of-the-art commercial embedding up to 44 [3] likely increase future (Fig. 14.7.1(a)). We observe two significant challenges sensing: 1) increasing sensors, 2) placing them a regular manner (not solely on potential hotspots). mostly...
It is widely known that relatively long DRAM latency forms a bottleneck in computing systems. However, vendors are strongly reluctant to decrease due the additional manufacturing cost. Therefore, we set our goal reduce without any modification existing structure. To accomplish goal, focus on an intrinsic phenomenon DRAM: electric charge variation cell capacitors. Then, draw two key insights: i) row-access of row function elapsed time from when was last refreshed, and ii) also remaining until...
The growing computational demands of AI inference have led to widespread use hardware accelerators for different platforms, spanning from edge the datacenter/cloud. Certain application areas, such as in high-frequency trading (HFT) [1–2], a hard latency deadline successful execution. We present our new accelerator which achieves high capability with outstanding single-stream responsiveness demanding service-layer objective (SLO)-based services and pipelined applications, including large...
The relatively high latency of DRAM is mostly caused by the long row-activation time which in fact consists sensing and restoring time. Memory controllers cannot distinguish between them since they are performed consecutively a single command. If these two steps separated, can be delayed until access uncongested. Hence, we propose Quick-Access (Q-DRAM) discriminates restoring. Our approach to allow destructive (i.e., only without command) using per-bank multiple row-buffers. We call...
Temperature-to-power technique is useful for post-silicon power model validation. However, the previous works were applicable only to steady-state analysis. In this paper, we propose a new temperature-to-power technique, named PowerField, supporting both transient and analysis based on probabilistic approach. Unlike works, PowerField uses two consecutive thermal images find most feasible distribution that causes change between input images. To obtain map with highest probability, adopted...
As the number of datasets processed in computing systems has increased recent years, there is growing demand for high capacity main memory subsystems. However, further increases conventional DRAM-based have stalled due to scaling limitations. Recent studies shown that PCM, which can provide greater than DRAM, emerging as a candidate memory. PCM suffers from problems related thermal mechanisms employed storing data. The Write Disturbance (WD) phenomenon occurs when severely damage data...
Spatial thermal distribution of a chip is an essential information for dynamic management. To get rich map, the sensor area required to be reduced radically. However, squeezing size about face its physical limitation. In this background, we propose area-efficient sensing technique: hybrid temperature network. The proposed architecture fully exploits spatial low-pass filtering effect systems, which implies that most resides in very low frequency region. Our on-chip network consists small...
Initializing memory with zero data is essential for safe management. However, initializing a large area slows down the system significantly. The most likely cause initialization to slow limited DRAM method. At present, only way initialize execute multiple WRITE commands. command because of its small granularity and bus occupancy. In this brief, we propose an efficient in-DRAM method inspired by internal structure operation DRAM. proposed method, called row reset, uses buffer out single at...
As the DRAM cell size continues to shrink, proportion of leaky cells is increasing. a result, prior approaches, called retention aware refresh, which skip unnecessary refresh operations for non-leaky cells, are unable as many before. The large granularity mechanism makes this problem more serious. Specifically, even when there only small number in particular group, that group classified group. Because that, also belong refreshed at an unnecessarily frequent rate. Since larger, inefficiency...
Several previous works have changed DRAM bank structure to reduce memory access latency and shown performance improvement. However, changes in the area-optimized can incur large area-overhead. To solve this problem, we propose Multiple Clone Row (MCR-DRAM), which uses existing without any modification. Our key idea is (MCR), multiple rows are simultaneously turned on or off consist of a logically single row. MCR provides two advantages enable our low-latency mechanisms (Early-Access,...
This letter presents a modified low-complexity chase (LCC) algorithm, where fewer number of vectors can be tested with minor error correction performance degradation. The proposed LCC decoding pre-determines whether the errors in received codeword is even or odd, and it processes only necessary test vectors. As result, reduced by half compared to conventional decoding. Reed-Solomon (255,239) decoder algorithm has been implemented using 65nm CMOS process. hardware implementation results show...
DDR4 SDRAM introduced a new hierarchy in DRAM organization: bank-group (BG). The main purpose of BG is to increase I/O bandwidth without growing DRAM-internal bus-width. We, however, found that other benefits can be derived from the hierarchy. To achieve benefits, we propose architecture using BG-hierarchy, leading creation BG-Level Parallelism (BGLP). By exploiting BGLP, overall parallelism grows operations. We also argue BGLP feasible solution cost-sensitive industry because additional...
This paper describes CMOS time-domain temperature sensors. A principle of this type sensors is inverter's time-delay variation with temperature. The variation, however, has nonlinearity which a fundamental error source. Therefore, we propose new sensor that improves linearity using an injection-locked oscillator (ILO). Since the ILO opposite curvature inverter delay line in domain, nonlinear induced by inverters can be eliminated. Integral (INL) reduced from 3.6 LSB to 0.56 (84% reduction),...
Current computer systems require large memory capacities to manage the tremendous volume of datasets. A DRAM cell consists a transistor and capacitor, their size has direct impact on density. While technology scaling can provide higher density, this benefit comes at expense low drivability, due increase in series resistance smaller transistor, which slows process restoring charge cells. operations recovery processes destructive nature Among such operations, write most difficulty meeting...
DRAM systems are hierarchically organized: Channel-Rank-Bank. A channel is connected to multiple ranks, and each rank has banks. This hierarchical structure facilitates creating parallelisms in DRAM. The current architecture supports bank-level parallelism; as many rows banks can be moved simultaneously at bank-level. However, rank-level parallelism not supported. For this reason, only one column accessed a time, although its own data bus that carry column. Namely, operations do exploit the...
Transient temperature-to-power conversion is as important steady-state analysis since power distributions tend to change dynamically. In this work, we propose PowerField framework find the most probable distribution from consecutive thermal images. Since transient vulnerable spatio-temporal noise, adopted a maximum-a-posteriori Markov random field enhance noise immunity. The map obtained by minimizing energy function which calculated using an approximated equation. Experimental results with...