NFDI4DS | UHH-SEMS - Publication Details

Gunho Park

ORCID: 0000-0002-8078-4356

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5011127697

Research Areas

Low-power high-performance VLSI design
Advanced Memory and Neural Computing
Ferroelectric and Negative Capacitance Devices
Parallel Computing and Optimization Techniques
Interconnection Networks and Systems
Algorithms and Data Compression
Analog and Mixed-Signal Circuit Design
Advancements in Semiconductor Devices and Circuit Design
Embedded Systems Design Techniques
CCD and CMOS Imaging Sensors
Medical Image Segmentation Techniques
Advanced Image and Video Retrieval Techniques
Numerical Methods and Algorithms
Domain Adaptation and Few-Shot Learning
Topic Modeling
Particle accelerators and beam dynamics
Advanced Data Compression Techniques
Magnetic confinement fusion research

Pohang University of Science and Technology
2021-2025

Design and Analysis of Approximate Compressors for Balanced Error Accumulation in MAC Operator

OPENALEX - Publications

Gunho Park Jaeha Kung Youngjoo Lee

In this paper, we present a novel approximate computing scheme suitable for realizing the energy-efficient multiply-accumulate (MAC) processing. contrast to prior works that suffer from error accumulation limiting range, utilize different multipliers in an interleaved way compensate errors opposite direction during accumulate operations. For balanced accumulation, first design 4-2 compressors generating while minimizing computational costs. Based on probabilistic analysis, positive and...

10.1109/tcsi.2021.3073177 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2021-04-26

An Investigation of FP8 Across Accelerators for LLM Inference

OPENALEX - Publications

Jiwoo Kim Joonhyung Lee Gunho Park Byeongwook Kim Se Jung Kwon and 2 more

The introduction of 8-bit floating-point (FP8) computation units in modern AI accelerators has generated significant interest FP8-based large language model (LLM) inference. Unlike 16-bit formats, FP8 deep learning requires a shared scaling factor. Additionally, while E4M3 and E5M2 are well-defined at the individual value level, their accumulation methods remain unspecified vary across hardware software implementations. As result, behaves more like quantization format than standard numeric...

10.48550/arxiv.2502.01070 preprint EN arXiv (Cornell University) 2025-02-03

FIGLUT: An Energy-Efficient Accelerator Design for FP-INT GEMM Using Look-Up Tables

OPENALEX - Publications

Gunho Park Hyeokjun Kwon Ji‐Woo Kim Jihyun Bae Baeseong Park and 2 more

10.1109/hpca61900.2025.00085 article EN 2025-03-01

LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models

OPENALEX - Publications

Gunho Park Baeseong Park Minsub Kim Sungjae Lee Jeong-Hoon Kim and 5 more

Recent advances in self-supervised learning and the Transformer architecture have significantly improved natural language processing (NLP), achieving remarkably low perplexity. However, growing size of NLP models introduces a memory wall problem during generation phase. To mitigate this issue, recent efforts focused on quantizing model weights to sub-4-bit precision while preserving full for activations, resulting practical speed-ups inference single GPU. these improvements primarily stem...

10.48550/arxiv.2206.09557 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Simplified Compressor and Encoder Designs for Low-Cost Approximate Radix-4 Booth Multiplier

OPENALEX - Publications

Gunho Park Jaeha Kung Youngjoo Lee

In this brief, we present a novel design methodology of cost-effective approximate radix-4 Booth multipliers, which can significantly reduce the power consumption error-resilient signal processing tasks. contrast that prior studies only focus on approximation either partial product generation with encoders or reductions compressors, proposed method considers two major steps jointly by forcing generated error directions to be opposite each other. As internal errors are naturally balanced have...

10.1109/tcsii.2022.3217696 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2022-11-04

TF-MVP: Novel Sparsity-Aware Transformer Accelerator with Mixed-Length Vector Pruning

OPENALEX - Publications

Eunji Yoo Gunho Park Jung Gyu Min Se Jung Kwon Baeseong Park and 2 more

We present the energy-efficient TF-MVP architecture, a sparsity-aware transformer accelerator, by introducing novel algorithm-hardware co-optimization techniques. From previous fine-grained pruning map, for first time, direction strength is developed to analyze patterns quantitatively, indicating major and size of each layer. Then, mixed-length vector (MVP) proposed generate hardware-friendly pruned-transformer model, which fully supported our accelerator with reconfigurable PE structure....

10.1109/dac56929.2023.10247799 article EN 2023-07-09

No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization

OPENALEX - Publications

June Yong Yang Byeongwook Kim Jeongin Bae Beomseok Kwon Gunho Park and 3 more

Key-Value (KV) Caching has become an essential technique for accelerating the inference speed and throughput of generative Large Language Models~(LLMs). However, memory footprint KV cache poses a critical bottleneck in LLM deployment as size grows with batch sequence length, often surpassing even model itself. Although recent methods were proposed to select evict unimportant pairs from reduce consumption, potential ramifications eviction on process are yet be thoroughly examined. In this...

10.48550/arxiv.2402.18096 preprint EN arXiv (Cornell University) 2024-02-28

Low-Power Encoder and Compressor Design for Approximate Radix-8 Booth Multiplier

OPENALEX - Publications

Ji‐Woo Kim Gunho Park Youngjoo Lee

10.1109/iscas58744.2024.10558596 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2024-05-19

Energy-Efficient RISC-V-Based Vector Processor for Cache-Aware Structurally-Pruned Transformers

OPENALEX - Publications

Jung Gyu Min Dongyun Kam Younghoon Byun Gunho Park Youngjoo Lee

Based on recent RISC-V designs, we present in this paper a low-power vector processor architecture for efficiently deploying vision transformer (ViT) models. To fairly measure the processing efficiency of different designs with instruction/data cache memories, first develop evaluation framework based numerous design tools jointly considering algorithm, architecture, and circuit performances together, numerically revealing that previous CSR-based data compression cannot accelerate pruned...

10.1109/islped58423.2023.10244508 article EN 2023-08-07

Coming Soon ...