Youngsok Kim

ORCID: 0000-0002-1015-9969
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Advanced Memory and Neural Computing
  • Advanced Graph Neural Networks
  • Ferroelectric and Negative Capacitance Devices
  • Caching and Content Delivery
  • Domain Adaptation and Few-Shot Learning
  • Cloud Computing and Resource Management
  • Graph Theory and Algorithms
  • Adversarial Robustness in Machine Learning
  • Topic Modeling
  • Neural Networks and Applications
  • Stochastic Gradient Optimization Techniques
  • Distributed systems and fault tolerance
  • Interconnection Networks and Systems
  • IoT and Edge/Fog Computing
  • Security and Verification in Computing
  • Low-power high-performance VLSI design
  • VLSI and Analog Circuit Testing
  • Distributed and Parallel Computing Systems
  • Algorithms and Data Compression
  • Natural Language Processing Techniques
  • Neural dynamics and brain function
  • Industrial Vision Systems and Defect Detection

Yonsei University
2019-2024

Seoul National University
2017-2019

Pohang University of Science and Technology
2014-2016

Korea Post
2013-2015

We are experiencing an explosive growth in the number of consumer devices, including smartphones, tablets, web-based computers such as Chromebooks, and wearable devices. For this class energy efficiency is a first-class concern due to limited battery capacity thermal power budget. find that data movement major contributor total system execution time The performance costs moving between memory compute units significantly higher than computation. As result, addressing crucial for In work, we...

10.1145/3173162.3173177 article EN 2018-03-19

Graphics processing units (GPUs) are important components of modern computing devices for not only graphics rendering, but also efficient parallel computations. However, their security problems ignored despite importance and popularity. In this paper, we first perform an in-depth analysis on GPUs to detect vulnerabilities. We observe that contemporary, widely-used GPUs, both NVIDIA's AMD's, do initialize newly allocated GPU memory pages which may contain sensitive user data. By exploiting...

10.1109/sp.2014.9 article EN IEEE Symposium on Security and Privacy 2014-05-01

Emerging mobile services heavily utilize Neural Networks (NNs) to improve user experiences. Such NN-assisted depend on fast NN execution for high responsiveness, demanding devices minimize the latency by efficiently utilizing their underlying hardware resources. To better resources, existing frameworks either employ various CPU-friendly optimizations (e.g., vectorization, quantization) or exploit data parallelism using heterogeneous processors such as GPUs and DSPs. However, performance is...

10.1145/3302424.3303950 article EN 2019-03-22

In training of modern large natural language processing (NLP) models, it has become a common practice to split models using 3D parallelism multiple GPUs. Such technique, however, suffers from high overhead inter-node communication. Compressing the communication is one way mitigate by reducing traffic volume; existing compression techniques have critical limitations be applied for NLP with in that 1) only data targeted, and 2) schemes already harm model quality too much.

10.1145/3575693.3575712 article EN 2023-01-27

We are experiencing an explosive growth in the number of consumer devices, including smartphones, tablets, web-based computers such as Chromebooks, and wearable devices. For this class energy efficiency is a first-class concern due to limited battery capacity thermal power budget. find that data movement major contributor total system execution time The performance costs moving between memory compute units significantly higher than computation. As result, addressing crucial for In work, we...

10.1145/3296957.3173177 article EN ACM SIGPLAN Notices 2018-03-19

Thanks to the recent advances in Deep Neural Networks (DNNs), DNN-based object detection systems become highly accurate and widely used real-time environments such as autonomous vehicles, drones security robots. Although should detect objects within a certain time limit that can vary depending on their execution vehicle speeds, existing blindly execute entire long-latency DNNs without reflecting time-varying limits, thus they cannot guarantee constraints. This work proposes novel system...

10.1109/rtas48715.2020.000-8 article EN 2020-04-01

Model quantization is considered as a promising method to greatly reduce the resource requirements of deep neural networks. To deal with performance drop induced by errors, popular use training data fine-tune quantized In real-world environments, however, such frequently infeasible because unavailable due security, privacy, or confidentiality concerns. Zero-shot addresses problems, usually taking information from weights full-precision teacher network compensate this paper, we first analyze...

10.1109/cvpr52688.2022.00813 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

The recent huge advance of Large Language Models (LLMs) is mainly driven by the increase in number parameters. This has led to substantial memory capacity requirements, necessitating use dozens GPUs just meet capacity. One popular solution this storage-offloaded training, which uses host and storage as an extended hierarchy. However, obviously comes at cost bandwidth bottleneck because devices have orders magnitude lower compared that GPU device memories. Our work, Smart-Infinity, addresses...

10.1109/hpca57654.2024.00034 article EN 2024-03-02

Efficient cache tag management is a primary design objective for large, in-package DRAM caches. Recently, Tagless Caches (TDCs) have been proposed to completely eliminate tagging structures from both on-die SRAM and DRAM, which are major scalability bottleneck future multi-gigabyte However, TDC imposes constraint on block size be the same as OS page (e.g., 4KB) it takes unified approach address translation management. Caching at granularity, or page-based caching, incurs significant...

10.1109/hpca.2016.7446068 article EN 2016-03-01

We present dataflow mirroring, architectural support for low-overhead fine-grained systolic array allocation which overcomes the limitations of prior coarse-grained spatial-multitasking Neural Processing Unit (NPU) architectures. The key idea mirroring is to reverse dataflows co-located Networks (NNs) in horizontal and/or vertical directions, allowing boundaries be set between any adjacent rows and columns a supporting up four-way spatial multitasking. Our detailed experiments using MLPerf...

10.1109/dac18074.2021.9586312 article EN 2021-11-08

This work presents DANCE, a differentiable approach towards the co-exploration of hardware accelerator and network architecture design. At heart DANCE is evaluator network. By modeling evaluation software with neural network, relation between design metrics becomes differentiable, allowing search to be performed backpropagation. Compared naive existing approaches, our method performs in significantly shorter time, while achieving superior accuracy cost metrics.

10.1109/dac18074.2021.9586121 article EN 2021-11-08

Modern dual in-line memory modules (DIMMs) support processing-in-memory (PIM) by implementing in-DIMM processors (IDPs) located near banks. PIM can greatly accelerate in-memory join, whose performance is frequently bounded main-memory accesses, offloading the operations of join from host central processing units (CPUs) to IDPs. As real hardware has not been available until very recently, prior PIM-assisted algorithms have relied on simulators which assume fast shared between IDPs and...

10.1145/3589258 article EN Proceedings of the ACM on Management of Data 2023-06-13

Model quantization is known as a promising method to compress deep neural networks, especially for inferences on lightweight mobile or edge devices. However, model usually requires access the original training data maintain accuracy of full-precision models, which often infeasible in real-world scenarios security and privacy issues. A popular approach perform without use synthetically generated samples, based batch-normalization statistics adversarial learning. drawback such approaches that...

10.48550/arxiv.2111.02625 preprint EN cc-by-nc-nd arXiv (Cornell University) 2021-01-01

GPU programmers suffer from programmer-managed memory because both performance and programmability heavily depend on allocation CPU-GPU data transfer mechanisms. To improve programmability, should be able to place only the frequently accessed by while overlapping transfers executions as much possible. However, current architectures programming models blindly entire memory, requiring a significantly large size. Otherwise, they must trigger unnecessary due an insufficient In this paper, we...

10.1109/hpca.2014.6835963 article EN 2014-02-01

Spiking Neural Networks (SNNs) play an important role in neuroscience as they help neuroscientists understand how the nervous system works. To model system, SNNs incorporate concept of time into neurons and inter-neuron interactions called spikes; a neuron's internal state changes with respect to input spikes, neuron fires output spike when its satisfies certain conditions. As forming behave differently, SNN simulation frameworks must be able simulate diverse behaviors neurons. support any...

10.1109/isca.2018.00032 article EN 2018-06-01

Conventional servers have achieved high performance by employing fast CPUs to run compute-intensive workloads, while making operating systems manage relatively slow I/O devices through memory accesses and interrupts. However, as the emerging workloads are becoming heavily data-intensive (e.g., NVM storage, high-bandwidth NICs, GPUs) come enable low-latency device operations, traditional host-centric server architectures fail deliver due their inefficient handling mechanisms. Furthermore,...

10.1145/2830772.2830794 article EN 2015-12-05

Graph convolutional networks (GCNs) are becoming increasingly popular as they overcome the limited applicability of prior neural networks. One recent trend in GCNs is use deep network architectures. As opposed to traditional GCNs, which only span around two five layers deep, modern now incorporate tens hundreds with help residual connections. From such we find an important characteristic that exhibit very high intermediate feature sparsity. This reveals a new opportunity for accelerators...

10.1109/hpca56546.2023.10071102 article EN 2023-02-01

Programmer-managed GPU memory is a major challenge in writing applications. Programmers must rewrite and optimize an existing code for different size both portability performance. Alternatively, they can achieve only by disabling at the cost of significant performance degradation. In this paper, we propose ScaleGPU, novel architecture to enable high-performance memory-unaware programming. ScaleGPU uses as cache CPU provide programmers view memory-sized programming space. also achieves high...

10.1109/l-ca.2013.19 article EN IEEE Computer Architecture Letters 2013-07-16

To understand how the human brain works, neuroscientists heavily rely on simulations which incorporate concept of time to their operating model. In simulations, neurons transmit signals through synapses whose weights change over and by activity associated neurons. Such changes in synaptic weights, known as learning, are thought contribute memory, various learning rules exist model different behaviors brain. Due diverse rules, perform using highly programmable general-purpose processors....

10.1145/3352460.3358268 article EN 2019-10-11

Application caching is a key feature to enable fast application switches for mobile devices by the entire memory pages of applications in device's physical memory. However, requires prohibitive amount unless swap employed maintain only working sets Unfortunately, often disable invaluable as it can severely decrease flash-based local storage already marginal lifespan due increased writes device. As result, modern suffering from insufficient space end up killing memory-hungry and keeping few...

10.1109/ccgrid.2016.22 article EN 2016-05-01

With the advance in genome sequencing technology, lengths of deoxyribonucleic acid (DNA) results are rapidly increasing at lower prices than ever. However, longer come cost a heavy computational burden on aligning them. For example, sequences to human reference can take tens or even hundreds hours. The current de facto standard approach for alignment is based guided dynamic programming method. Although this takes long time and could potentially benefit from high-throughput graphic processing...

10.1145/3627535.3638474 article EN other-oa 2024-02-20

Analytical models can greatly help computer architects perform orders of magnitude faster early-stage design space exploration than using cycle-level simulators. To facilitate rapid for graphics processing units (GPUs), prior studies have proposed GPU analytical which capture first-order stall events causing performance degradation; however, the existing cannot accurately model modern GPUs due to their outdated and highly abstract core microarchitecture assumptions. Therefore, evaluate GPUs,...

10.1145/3470496.3527384 article EN 2022-05-31

Neural Processing Units (NPUs) frequently suffer from low hardware utilization as the efficiency of their systolic arrays heavily depends on characteristics a deep neural network (DNN). Spatial multitasking is promising solution to overcome NPU utilization; however, state-of-the-art spatial-multitasking architecture achieves sub-optimal performance due its coarse-grained systolic-array distribution and incurs significant implementation costs. In this paper, we propose <italic...

10.1109/tc.2023.3299030 article EN IEEE Transactions on Computers 2023-08-01
Coming Soon ...