NFDI4DS | UHH-SEMS - Publication Details

NUAT: A non-uniform access time memory controller

OPENALEX - Publications

Wongyu Shin Jeongmin Yang Jungwhan Choi Lee‐Sup Kim

With rapid development of micro-processors, off-chip memory access becomes a system bottleneck. DRAM, main in most computers, has concentrated only on capacity and bandwidth for decades to achieve high performance computing. However, DRAM latency should also be considered keep the trend multi-core era. Therefore, we propose NUAT which is new controller focusing reducing without any modification existing structure. We exploit DRAM's intrinsic phenomenon: electric charge variation cell...

10.1109/hpca.2014.6835956 article EN 2014-02-01

Multiple clone row DRAM

OPENALEX - Publications

Jungwhan Choi Wongyu Shin Jaemin Jang Jinwoong Suh Yongkee Kwon and 2 more

Several previous works have changed DRAM bank structure to reduce memory access latency and shown performance improvement. However, changes in the area-optimized can incur large area-overhead. To solve this problem, we propose Multiple Clone Row (MCR-DRAM), which uses existing without any modification.

10.1145/2749469.2750402 article EN 2015-05-26

Energy efficient data encoding in DRAM channels exploiting data value similarity

OPENALEX - Publications

Hoseok Seol Wongyu Shin Jaemin Jang Jungwhan Choi Jinwoong Suh and 1 more

As DRAM data bandwidth increases, tremendous energy is dissipated in the bus. To reduce consumed bus, interfaces with asymmetric termination, such as Pseudo Open Drain (POD) and Low Voltage Swing Terminated Logic (LVSTL), have been adopted modern DRAMs. In using amount of termination proportional to hamming weight words. this work, we propose Bitwise Difference Encoding (BD-Encoding), which decreases words, leading a reduction consumption Since smaller words also reduces switching activity,...

10.1145/3007787.3001213 article EN ACM SIGARCH Computer Architecture News 2016-06-18

Energy Efficient Data Encoding in DRAM Channels Exploiting Data Value Similarity

OPENALEX - Publications

Hoseok Seol Wongyu Shin Jaemin Jang Jungwhan Choi Jinwoong Suh and 1 more

As DRAM data bandwidth increases, tremendous energy is dissipated in the bus. To reduce consumed bus, interfaces with symmetric termination, such as Pseudo Open Drain (POD) and Low Voltage Swing Terminated Logic (LVSTL), have been adopted modern DRAMs. In using asymmetric amount of termination proportional to hamming weight words. this work, we propose Bitwise Difference Encoding (BD-Encoding), which decreases words, leading a reduction consumption Since smaller words also reduces switching...

10.1109/isca.2016.68 article EN 2016-06-01

All-digital hybrid temperature sensor network for dense thermal monitoring

OPENALEX - Publications

Seungwook Paek Wongyu Shin Jaeyoung Lee Hyoeun Kim Jun‐Seok Park and 1 more

Technology scaling and many-core design trends demand detailed information regarding the spatial temperature distribution, which is essential for dynamic thermal management [1,2]. The number of on-chip sensors in high-performance processors increasing, with state-of-the-art commercial embedding up to 44 [3] likely increase future (Fig. 14.7.1(a)). We observe two significant challenges sensing: 1) increasing sensors, 2) placing them a regular manner (not solely on potential hotspots). mostly...

10.1109/isscc.2013.6487726 article EN 2013-02-01

DRAM-Latency Optimization Inspired by Relationship between Row-Access Time and Refresh Timing

OPENALEX - Publications

Wongyu Shin Jungwhan Choi Jaemin Jang Jinwoong Suh Youngsuk Moon and 2 more

It is widely known that relatively long DRAM latency forms a bottleneck in computing systems. However, vendors are strongly reluctant to decrease due the additional manufacturing cost. Therefore, we set our goal reduce without any modification existing structure. To accomplish goal, focus on an intrinsic phenomenon DRAM: electric charge variation cell capacitors. Then, draw two key insights: i) row-access of row function elapsed time from when was last refreshed, and ii) also remaining until...

10.1109/tc.2015.2512863 article EN IEEE Transactions on Computers 2015-12-28

2.4 ATOMUS: A 5nm 32TFLOPS/128TOPS ML System-on-Chip for Latency Critical Applications

OPENALEX - Publications

Chang-Hyo Yu Hyoeun Kim Sungho Shin Kyeongryeol Bong Hyun‐Suk Kim and 61 more

The growing computational demands of AI inference have led to widespread use hardware accelerators for different platforms, spanning from edge the datacenter/cloud. Certain application areas, such as in high-frequency trading (HFT) [1–2], a hard latency deadline successful execution. We present our new accelerator which achieves high capability with outstanding single-stream responsiveness demanding service-layer objective (SLO)-based services and pipelined applications, including large...

10.1109/isscc49657.2024.10454509 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2024-02-18

Q-DRAM: Quick-Access DRAM with Decoupled Restoring from Row-Activation

OPENALEX - Publications

Wongyu Shin Jungwhan Choi Jaemin Jang Jinwoong Suh Yongkee Kwon and 3 more

The relatively high latency of DRAM is mostly caused by the long row-activation time which in fact consists sensing and restoring time. Memory controllers cannot distinguish between them since they are performed consecutively a single command. If these two steps separated, can be delayed until access uncongested. Hence, we propose Quick-Access (Q-DRAM) discriminates restoring. Our approach to allow destructive (i.e., only without command) using per-bank multiple row-buffers. We call...

10.1109/tc.2015.2479587 article EN IEEE Transactions on Computers 2015-09-18

PowerField: A Probabilistic Approach for Temperature-to-Power Conversion Based on Markov Random Field Theory

OPENALEX - Publications

Seungwook Paek Wongyu Shin Jaehyeong Sim Lee‐Sup Kim

Temperature-to-power technique is useful for post-silicon power model validation. However, the previous works were applicable only to steady-state analysis. In this paper, we propose a new temperature-to-power technique, named PowerField, supporting both transient and analysis based on probabilistic approach. Unlike works, PowerField uses two consecutive thermal images find most feasible distribution that causes change between input images. To obtain map with highest probability, adopted...

10.1109/tcad.2013.2272542 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2013-09-16

Sparse-Insertion Write Cache to Mitigate Write Disturbance Errors in Phase Change Memory

OPENALEX - Publications

Jaemin Jang Wongyu Shin Jungwhan Choi Yongju Kim Lee‐Sup Kim

As the number of datasets processed in computing systems has increased recent years, there is growing demand for high capacity main memory subsystems. However, further increases conventional DRAM-based have stalled due to scaling limitations. Recent studies shown that PCM, which can provide greater than DRAM, emerging as a candidate memory. PCM suffers from problems related thermal mechanisms employed storing data. The Write Disturbance (WD) phenomenon occurs when severely damage data...

10.1109/tc.2018.2881137 article EN IEEE Transactions on Computers 2018-11-13

Hybrid Temperature Sensor Network for Area-Efficient On-Chip Thermal Map Sensing

OPENALEX - Publications

Seungwook Paek Wongyu Shin Jae-Young Lee Hyoeun Kim Jun‐Seok Park and 1 more

Spatial thermal distribution of a chip is an essential information for dynamic management. To get rich map, the sensor area required to be reduced radically. However, squeezing size about face its physical limitation. In this background, we propose area-efficient sensing technique: hybrid temperature network. The proposed architecture fully exploits spatial low-pass filtering effect systems, which implies that most resides in very low frequency region. Our on-chip network consists small...

10.1109/jssc.2014.2375335 article EN IEEE Journal of Solid-State Circuits 2015-01-20

In-DRAM Data Initialization

OPENALEX - Publications

Hoseok Seol Wongyu Shin Jaemin Jang Jungwhan Choi Jinwoong Suh and 1 more

Initializing memory with zero data is essential for safe management. However, initializing a large area slows down the system significantly. The most likely cause initialization to slow limited DRAM method. At present, only way initialize execute multiple WRITE commands. command because of its small granularity and bus occupancy. In this brief, we propose an efficient in-DRAM method inspired by internal structure operation DRAM. proposed method, called row reset, uses buffer out single at...

10.1109/tvlsi.2017.2737646 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2017-08-17

Elaborate Refresh: A Fine Granularity Retention Management for Deep Submicron DRAMs

OPENALEX - Publications

Hoseok Seol Wongyu Shin Jaemin Jang Jungwhan Choi Hakseung Lee and 1 more

As the DRAM cell size continues to shrink, proportion of leaky cells is increasing. a result, prior approaches, called retention aware refresh, which skip unnecessary refresh operations for non-leaky cells, are unable as many before. The large granularity mechanism makes this problem more serious. Specifically, even when there only small number in particular group, that group classified group. Because that, also belong refreshed at an unnecessarily frequent rate. Since larger, inefficiency...

10.1109/tc.2018.2820052 article EN IEEE Transactions on Computers 2018-03-27

Multiple clone row DRAM

OPENALEX - Publications

Jungwhan Choi Wongyu Shin Jaemin Jang Jinwoong Suh Yongkee Kwon and 2 more

Several previous works have changed DRAM bank structure to reduce memory access latency and shown performance improvement. However, changes in the area-optimized can incur large area-overhead. To solve this problem, we propose Multiple Clone Row (MCR-DRAM), which uses existing without any modification. Our key idea is (MCR), multiple rows are simultaneously turned on or off consist of a logically single row. MCR provides two advantages enable our low-latency mechanisms (Early-Access,...

10.1145/2872887.2750402 article EN ACM SIGARCH Computer Architecture News 2015-06-13

An Even/Odd Error Detection Based Low-Complexity Chase Decoding for Low-Latency RS Decoder Design

OPENALEX - Publications

Jinho Jeong Dongyeob Shin Wongyu Shin Jongsun Park

This letter presents a modified low-complexity chase (LCC) algorithm, where fewer number of vectors can be tested with minor error correction performance degradation. The proposed LCC decoding pre-determines whether the errors in received codeword is even or odd, and it processes only necessary test vectors. As result, reduced by half compared to conventional decoding. Reed-Solomon (255,239) decoder algorithm has been implemented using 65nm CMOS process. hardware implementation results show...

10.1109/lcomm.2021.3054753 article EN IEEE Communications Letters 2021-01-26

Bank-Group Level Parallelism

OPENALEX - Publications

Wongyu Shin Jaemin Jang Jungwhan Choi Jinwoong Suh Lee‐Sup Kim

DDR4 SDRAM introduced a new hierarchy in DRAM organization: bank-group (BG). The main purpose of BG is to increase I/O bandwidth without growing DRAM-internal bus-width. We, however, found that other benefits can be derived from the hierarchy. To achieve benefits, we propose architecture using BG-hierarchy, leading creation BG-Level Parallelism (BGLP). By exploiting BGLP, overall parallelism grows operations. We also argue BGLP feasible solution cost-sensitive industry because additional...

10.1109/tc.2017.2665475 article EN IEEE Transactions on Computers 2017-02-07

An area-efficient on-chip temperature sensor with nonlinearity compensation using injection-locked oscillator (ILO)

OPENALEX - Publications

Wongyu Shin Seungwook Paek Lee‐Sup Kim

This paper describes CMOS time-domain temperature sensors. A principle of this type sensors is inverter's time-delay variation with temperature. The variation, however, has nonlinearity which a fundamental error source. Therefore, we propose new sensor that improves linearity using an injection-locked oscillator (ILO). Since the ILO opposite curvature inverter delay line in domain, nonlinear induced by inverters can be eliminated. Integral (INL) reduced from 3.6 LSB to 0.56 (84% reduction),...

10.1109/iscas.2014.6865517 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2014-06-01

Refresh-Aware Write Recovery Memory Controller

OPENALEX - Publications

Jaemin Jang Wongyu Shin Jungwhan Choi Jinwoong Suh Yongkee Kwon and 2 more

Current computer systems require large memory capacities to manage the tremendous volume of datasets. A DRAM cell consists a transistor and capacitor, their size has direct impact on density. While technology scaling can provide higher density, this benefit comes at expense low drivability, due increase in series resistance smaller transistor, which slows process restoring charge cells. operations recovery processes destructive nature Among such operations, write most difficulty meeting...

10.1109/tc.2016.2617333 article EN IEEE Transactions on Computers 2016-10-13

Rank-Level Parallelism in DRAM

OPENALEX - Publications

Wongyu Shin Jaemin Jang Jungwhan Choi Jinwoong Suh Yongkee Kwon and 2 more

DRAM systems are hierarchically organized: Channel-Rank-Bank. A channel is connected to multiple ranks, and each rank has banks. This hierarchical structure facilitates creating parallelisms in DRAM. The current architecture supports bank-level parallelism; as many rows banks can be moved simultaneously at bank-level. However, rank-level parallelism not supported. For this reason, only one column accessed a time, although its own data bus that carry column. Namely, operations do exploit the...

10.1109/tc.2017.2654339 article EN IEEE Transactions on Computers 2017-01-17

PowerField

OPENALEX - Publications

Seungwook Paek Seok‐Hwan Moon Wongyu Shin Jaehyeong Sim Lee‐Sup Kim

Transient temperature-to-power conversion is as important steady-state analysis since power distributions tend to change dynamically. In this work, we propose PowerField framework find the most probable distribution from consecutive thermal images. Since transient vulnerable spatio-temporal noise, adopted a maximum-a-posteriori Markov random field enhance noise immunity. The map obtained by minimizing energy function which calculated using an approximated equation. Experimental results with...

10.1145/2228360.2228474 article EN 2012-05-31