Jangwoo Kim

ORCID: 0000-0003-2193-5748
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Advanced Memory and Neural Computing
  • Cloud Computing and Resource Management
  • Ferroelectric and Negative Capacitance Devices
  • Radiation Effects in Electronics
  • Advanced Neural Network Applications
  • Caching and Content Delivery
  • Semiconductor materials and devices
  • Distributed systems and fault tolerance
  • Interconnection Networks and Systems
  • IoT and Edge/Fog Computing
  • Advancements in Battery Materials
  • Neural dynamics and brain function
  • Gas Dynamics and Kinetic Theory
  • Computational Fluid Dynamics and Aerodynamics
  • Advanced Battery Technologies Research
  • VLSI and Analog Circuit Testing
  • Network Packet Processing and Optimization
  • Low-power high-performance VLSI design
  • Advanced Battery Materials and Technologies
  • Distributed and Parallel Computing Systems
  • Advancements in Semiconductor Devices and Circuit Design
  • Seismic Imaging and Inversion Techniques
  • Seismic Waves and Analysis

Seoul National University
2015-2024

Samsung (South Korea)
2010-2020

IBM Research - Almaden
2019

Pohang University of Science and Technology
2014-2016

Cornell University
2014-2016

Hoseo University
2010-2015

Korea Post
2013-2015

McLean Hospital
2003-2009

Harvard University
2003-2009

Carnegie Mellon University
2004-2008

In deep sub-micron ICs, growing amounts of on-die memory and scaling effects make embedded memories increasingly vulnerable to reliability yield problems. As progresses, soft hard errors in the system will increase single error events are more likely cause large-scale multi- bit errors. However, conventional protection techniques can neither detect nor correct multi-bit without incurring large performance, area, power overheads. We propose two-dimensional (2D) coding memories, a scalable...

10.5555/1331699.1331719 article EN International Symposium on Microarchitecture 2007-12-01

In deep sub-micron ICs, growing amounts of on-die memory and scaling effects make embedded memories increasingly vulnerable to reliability yield problems. As progresses, soft hard errors in the system will increase single error events are more likely cause large-scale multi- bit errors. However, conventional protection techniques can neither detect nor correct multi-bit without incurring large performance, area, power overheads. We propose two-dimensional (2D) coding memories, a scalable...

10.1109/micro.2007.19 article EN 2007-01-01

A cost-effective multi-tenant neural network execution is becoming one of the most important design goals for modern accelerators. For example, as emerging AI services consist many heterogeneous executions, a cloud provider wants to serve large number clients using single accelerator improving its cost effectiveness. Therefore, an ideal next-generation should support simultaneous multi-neural execution, while fully utilizing hardware resources. However, existing accelerators which are...

10.1109/isca45697.2020.00081 article EN 2020-05-01

Emerging mobile services heavily utilize Neural Networks (NNs) to improve user experiences. Such NN-assisted depend on fast NN execution for high responsiveness, demanding devices minimize the latency by efficiently utilizing their underlying hardware resources. To better resources, existing frameworks either employ various CPU-friendly optimizations (e.g., vectorization, quantization) or exploit data parallelism using heterogeneous processors such as GPUs and DSPs. However, performance is...

10.1145/3302424.3303950 article EN 2019-03-22

The new focus on commercial workloads in simulation studies of server systems has caused a drastic increase the complexity and decrease speed tools. large-scale full-system model makes development monolithic tool prohibitively difficult task. Furthermore, detailed models simulate so slowly that experimental results must be based simulations only fractions second execution modelled system.This paper presents SIMFLEX, framework which uses component-based design rigorous statistical sampling to...

10.1145/1054907.1054914 article EN ACM SIGMETRICS Performance Evaluation Review 2004-03-01

Recent studies have suggested that the soft-error rate in microprocessor logic will become a reliability concern by 2010. This paper proposes an efficient error detection technique, called fingerprinting, detects differences execution across dual modular redundant (DMR) processor pair. Fingerprinting summarizes processor's history hash-based signature; between two mirrored processors are exposed comparing their fingerprints. tightly bounds latency and greatly reduces interprocessor...

10.1145/1024393.1024420 article EN 2004-10-07

This paper introduces a tagless cache architecture for large in-package DRAM caches. The conventional die-stacked has both TLB and tag array, which are responsible virtual-to-physical physical-to-cache address translation, respectively. We propose to align the granularity of caching with OS page size take unified approach translation management. To this end, we introduce cache-map (cTLB), stores virtual-to-cache, instead virtual-to-physical, mappings. At miss, miss handler allocates...

10.1145/2749469.2750383 article EN 2015-05-26

Security bugs in CPUs have critical security impacts to all the computation related hardware and software components as it is core of computation. In spite fact that architecture communities explored a vast number static or dynamic analysis techniques automatically identify such bugs, problem remains unsolved challenging largely due complex nature CPU RTL designs.This paper proposes DIFUZZRTL, an fuzzer discover unknown RTLs. DIFUZZRTL develops register-coverage guided fuzzing technique,...

10.1109/sp40001.2021.00103 article EN 2022 IEEE Symposium on Security and Privacy (SP) 2021-05-01

Coherent read misses in shared-memory multiprocessors account for a substantial fraction of execution time many important scientific and commercial workloads. We propose Temporal Streaming, to eliminate coherent by streaming data processor advance the corresponding memory accesses. dynamically identifies address sequences be streamed exploiting two common phenomena access patterns: (1) temporal correlation - groups shared addresses tend accessed together same order, (2) stream locality...

10.1145/1080695.1069989 article EN ACM SIGARCH Computer Architecture News 2005-05-01

Coherent read misses in shared-memory multiprocessors account for a substantial fraction of execution time many important scientific and commercial workloads. We propose temporal streaming, to eliminate coherent by streaming data processor advance the corresponding memory accesses. Temporal dynamically identifies address sequences be streamed exploiting two common phenomena access patterns: (1) correlation-groups shared addresses tend accessed together same order; (2) stream...

10.1109/isca.2005.50 article EN 2005-07-27

A million qubit-scale quantum computer is essential to realize the supremacy. Modern large-scale computers integrate multiple located in dilution refrigerators (DR) overcome each DR's unscaling cooling budget. However, a multi-DR introduces its unique challenges (i.e., slow and erroneous inter-DR entanglement, increased qubit scale), they make baseline error handling mechanism ineffective by increasing number of gate operations communication latency decode correct errors. Without resolving...

10.1145/3620665.3640388 article EN 2024-04-22

Previous proposals for soft-error tolerance have called redundantly executing a program as two concurrent threads on superscalar microarchitecture. In balanced design, the extra workload from redundant execution induces severe performance penalty due to increased contention resources throughout datapath. This paper identifies and analyzes four key factors that affect of execution, namely 1) issue bandwidth functional unit contention, 2) queue reorder buffer capacity 3) decode retirement 4)...

10.1109/micro.2004.19 article EN 2005-12-13

Midbrain dopamine (mDA) neurons play critical roles in the regulation of voluntary movement and their dysfunction is associated with Parkinson's disease. Pitx3 has been implicated proper development mDA substantia nigra pars compacta, which are selectively lost However, basic mechanisms underlying its role neuron and/or survival poorly understood. Toward this goal, we sought to identify downstream target genes by comparing gene expression profiles wild-type Pitx3-deficient aphakia mice. This...

10.1111/j.1471-4159.2009.06404.x article EN Journal of Neurochemistry 2009-09-24

The challenges for rechargeable lithium‐oxygen batteries of low practical capacity and poor round‐trip efficiency urgently demand effective cathode materials to overcome the limitations. However, synergy between multiple active is not well understood. Here, findings synergistic effect electrospun zinc oxide (ZnO) nanofibers graphene nanoribbons (GNRs) unzipped from carbon nanotubes (CNTs) as in are described. Furthermore, overpotentials discharge capacities tuned by surface defect states ZnO...

10.1002/aenm.201401412 article EN Advanced Energy Materials 2014-10-18

Silicon nanoparticles (Si NPs) wrapped by graphene in carbon nanofibers were obtained via electrospinning and subsequent thermal treatment. In this study, water-soluble poly(vinyl alcohol) (PVA) with low yield is selected to make the process water-based achieve a high silicon composite. It was also found that increasing amount of helps keep PVA fiber morphology after carbonization, while forming network. The SEM HRTEM images reveal micrometer heavily folded into sub-micron scale fibers...

10.1021/acsami.5b10548 article EN ACS Applied Materials & Interfaces 2016-02-08

Memory-augmented neural networks are getting more attention from many researchers as they can make an inference with the previous history stored in memory. Especially, among these memory-augmented networks, memory known for their huge reasoning power and capability to learn a large number of inputs rather than other networks. As size input datasets rapidly grows, necessity large-scale continuously arises. Such provide excellent power; however, current computer infrastructure cannot achieve...

10.1145/3307650.3322214 article EN 2019-06-14

Efficient cache tag management is a primary design objective for large, in-package DRAM caches. Recently, Tagless Caches (TDCs) have been proposed to completely eliminate tagging structures from both on-die SRAM and DRAM, which are major scalability bottleneck future multi-gigabyte However, TDC imposes constraint on block size be the same as OS page (e.g., 4KB) it takes unified approach address translation management. Caching at granularity, or page-based caching, incurs significant...

10.1109/hpca.2016.7446068 article EN 2016-03-01

Superconductor single-flux-quantum (SFQ) logic family has been recognized as a highly promising solution for the post-Moore's era, thanks to its ultra-fast and low-power switching characteristics. Therefore, researchers have made tremendous amount of effort in various aspects promote technology automate circuit design process (e.g., low-cost fabrication, tool development). However, there no progress designing convincing SFQ-based architectural unit due architects' lack understanding...

10.1109/micro50266.2020.00018 article EN 2020-10-01

The superconductor single-flux-quantum (SFQ) logic family has been recognized as a promising solution for the post-Moore era, thanks to ultrafast and low-power switching characteristics of devices. Researchers have made tremendous efforts in various aspects, especially device circuit design. However, there little progress designing convincing SFQ-based architectural unit due lack understanding about its potentials limitations at level. This article provides design principles units with an...

10.1109/mm.2021.3070488 article EN cc-by IEEE Micro 2021-04-05

Cryogenic computing can achieve high performance and power efficiency by dramatically reducing the device's leakage wire resistance at low temperatures. Recent advances towards cryogenic focus on developing cryogenic-optimal cache memory devices to overcome capacity, latency, walls. However, little research has been conducted develop a core architecture despite its potentials in performance, power, area efficiency. Once becomes available, it will also take full advantage of devices, which...

10.1109/isca45697.2020.00037 article EN 2020-05-01

GPU programmers suffer from programmer-managed memory because both performance and programmability heavily depend on allocation CPU-GPU data transfer mechanisms. To improve programmability, should be able to place only the frequently accessed by while overlapping transfers executions as much possible. However, current architectures programming models blindly entire memory, requiring a significantly large size. Otherwise, they must trigger unnecessary due an insufficient In this paper, we...

10.1109/hpca.2014.6835963 article EN 2014-02-01

Spiking Neural Networks (SNNs) play an important role in neuroscience as they help neuroscientists understand how the nervous system works. To model system, SNNs incorporate concept of time into neurons and inter-neuron interactions called spikes; a neuron's internal state changes with respect to input spikes, neuron fires output spike when its satisfies certain conditions. As forming behave differently, SNN simulation frameworks must be able simulate diverse behaviors neurons. support any...

10.1109/isca.2018.00032 article EN 2018-06-01

Modern computer architectures suffer from lack of architectural innovations, mainly due to the power wall and memory wall. That is, innovations become infeasible because they can prohibitively increase consumption their performance impacts are eventually bounded by slow accesses. To address challenges, making systems run at ultra-low temperatures (or cryogenic systems) has emerged as a highly promising solution both wire resistivity expected significantly reduce temperatures. However,...

10.1145/3307650.3322219 article EN 2019-06-14

Cryogenic computing, which is to run a computer at extremely low temperatures (e.g., 77K), highly promising solution dramatically improve the computer's performance and power efficiency thanks significantly reduced leakage wire resistance. However, architects are facing fundamental challenges in developing deploying cryogenic-optimal architectural units due lack of understanding about its cost-effectiveness feasibility device cooling costs vs. speedup, energy area saving) thus how architect...

10.1145/3373376.3378513 article EN 2020-03-09

Recent research indicates that prediction-based coherence optimizations offer substantial performance improvements for scientific applications in distributed shared memory multiprocessors.Important commercial also show sensitivity to latency, which will become more acute the future as technology scales.Therefore it is important investigate prediction of activity context workloads.This paper studies a trace-based Downgrade Predictor (DGP) predicting last stores cache blocks, and pattern-based...

10.1145/1054943.1054949 article EN 2004-01-01
Coming Soon ...