NFDI4DS | UHH-SEMS - Publication Details

GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks

OPENALEX - Publications

Ranggi Hwang Minhoo Kang Jiwon Lee Dongyun Kam Youngjoo Lee and 1 more

Graph convolutional neural networks (GCNs) have emerged as a key technology in various application domains where the input data is relational. A unique property of GCNs that its two primary execution stages, aggregation and combination, exhibit drastically different dataflows. Consequently, prior GCN accelerators tackle this research space by casting combination stages series sparse-dense matrix multiplication. However, work frequently suffers from inefficient movements, leaving significant...

10.1109/hpca56546.2023.10070983 article EN 2023-02-01

Panacea: Novel DNN Accelerator using Accuracy-Preserving Asymmetric Quantization and Energy-Saving Bit-Slice Sparsity

OPENALEX - Publications

Dongyun Kam Myeongji Yun Sunwoo Yoo Seungwoo Hong Zhengya Zhang and 1 more

10.1109/hpca61900.2025.00059 article EN 2025-03-01

Low-Latency SCL Polar Decoder Architecture Using Overlapped Pruning Operations

OPENALEX - Publications

Dongyun Kam Byeong Yong Kong Youngjoo Lee

Allowing the superior error-correction performance even for short-length codewords, successive-cancellation list (SCL) decoding algorithm has allowed polar code to be adopted in 5G New Radio standard control channel. However, existing SCL decoders still suffer from long processing latency caused by a number of serialized internal operations. In this work, solve problem, we present several parallel computing solutions operations, i.e., simplified data dependencies and two overlapped pruning...

10.1109/tcsi.2022.3230589 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2023-01-10

A 1.1μs 1.56Gb/s/mm2 Cost-Efficient Large-List SCL Polar Decoder Using Fully-Reusable LLR Buffers in 28nm CMOS Technology

OPENALEX - Publications

Dongyun Kam Byeong Yong Kong Youngjoo Lee

This paper presents a cost-efficient large-list SCL polar decoder supporting an ultra-reliable channel coding in 5G and beyond communications. To minimize huge implementation costs, the proposed design utilizes fully-reusable LLR buffers associated with stage unfolding overwriting schemes, significantly reducing on-chip buffer overheads by 67% compared to state-of-the-art decoder. Implemented 28nm CMOS, prototype list-8 achieves 1.1μs 1.56Gb/s/mm <sup...

10.1109/vlsitechnologyandcir46769.2022.9830317 article EN 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits) 2022-06-12

2.8 A 21.9ns 15.7 Gbps/mm² (128,15) BOSS FEC Decoder for 5G/6G URLLC Applications

OPENALEX - Publications

Dongyun Kam Sangbu Yun Jeongwon Choe Zhengya Zhang Namyoon Lee and 1 more

To enable emerging mission-critical applications, e.g., healthcare monitoring, remote surgery, and autonomous driving, 5G/6G ultra-reliable low-latency communication (URLLC) devices demand the concurrent fulfillment of ultra-reliability, low-latency, low-power communications, particularly in short data transmissions as depicted Fig. 2.8.1. However, existing short-length forward error-correction (FEC) solutions for URLLC cannot meet all challenging requirements at same time. The recent...

10.1109/isscc49657.2024.10454363 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2024-02-18

Constrained Sorter Design using Zero-One Principle

OPENALEX - Publications

Sangil Han Jaehee Kim Dongyun Kam Byeong Yong Kong Mi Jung Kim and 2 more

10.1109/iscas58744.2024.10557942 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2024-05-19

Massive MIMO Systems With Low-Resolution ADCs: Baseband Energy Consumption vs. Symbol Detection Performance

OPENALEX - Publications

Seungsik Moon In-Soo Kim Dongyun Kam Dong‐Woo Jee Junil Choi and 1 more

In massive multiple-input multiple-output (MIMO) systems using a large number of antennas, it would be difficult to connect high-resolution analog-to-digital converters (ADCs) each antenna component due high cost and energy consumption problems. To resolve these issues, there has been much work on implementing symbol detectors channel estimators low-resolution ADCs for MIMO systems. Although is intuitively true that makes possible save amount in systems, the relationship between detection...

10.1109/access.2018.2890427 article EN cc-by-nc-nd IEEE Access 2019-01-01

Ultra-Low-Latency LDPC Decoding Architecture using Reweighted Offset Min-Sum Algorithm

OPENALEX - Publications

Sangbu Yun Dongyun Kam Jeongwon Choe Byeong Yong Kong Youngjoo Lee

Due to an iterative nature, a low-density parity-check (LDPC) decoder is associated with long latency, being major bottleneck of the baseband processor in wireless communication systems. Based on practical min-sum (MS) decoding method, this paper, we present cost-effective algorithm for reducing processing latency LDPC decoders. By checking number short-length cycles code structure, proposed method dynamically changes reweighting factor at operations, successfully average iterations. In...

10.1109/iscas45731.2020.9181189 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2020-09-29

Ultralow-Latency Successive Cancellation Polar Decoding Architecture Using Tree-Level Parallelism

OPENALEX - Publications

Dongyun Kam Hoyoung Yoo Youngjoo Lee

Achieving the attractive error-correcting capability with a simple decoder structure, polar code using successive cancellation (SC) decoding is now expected to be installed at resource-limited IoT or embedded communications. However, existing SC decoders normally suffer from long processing latency caused by serialized steps, limiting practical applications of codes. In this article, solve problem, we present new low-complexity merging operation that can increase number parallel factors for...

10.1109/tvlsi.2021.3068965 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2021-04-09

Ultra-Low-Latency Parallel SC Polar Decoding Architecture for 5G Wireless Communications

OPENALEX - Publications

Dongyun Kam Youngjoo Lee

In this paper, we newly present a novel parallel polar decoding architecture that significantly reduces the processing latency for 5G wireless communications. Based on original tree, proposed scheme first constructs small trees generate multiple soft-decision messages in parallel, potentially reducing compared to previous serialized schemes. The hard-decision estimates are then calculated at following merging step decide decoded outputs and update trees. For each pruning is utilized further...

10.1109/iscas.2019.8702786 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2019-05-01

Low-Latency Polar Decoder Using Overlapped SCL Processing

OPENALEX - Publications

Dongyun Kam Byeong Yong Kong Youngjoo Lee

In this paper, we present a novel scheduling method that reduces the latency of polar decoders significantly. Unlike prior pruning-based successive cancellation list (SCL) decoding suffers from number idle cycles, proposed overlapped SCL scheme immediately begins node operations without waiting for to be sorted, being exempt such unfavorable cycles. All possible candidates next are precomputed in parallel with pruning operations, and readily selected minimize latency. For 5G New Radio...

10.1109/icassp39728.2021.9414326 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Design and Evaluation Frameworks for Advanced RISC-based Ternary Processor

OPENALEX - Publications

Dongyun Kam Jung Gyu Min Jongho Yoon Sunmean Kim Seokhyeong Kang and 1 more

In this paper, we introduce the design and veri-fication frameworks for developing a fully-functional emerging ternary processor. Based on existing compiling environments binary processors, given instructions, software-level framework provides an efficient way to convert programs assembly codes. We also present hardware-level rapidly evaluate performance of processor implemented in arbitrary technology. As case study, 9-trit advanced RISC-based (ART-9) core is newly developed by using...

10.23919/date54114.2022.9774584 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2022-03-14

Hard-Decision SCL Polar Decoder With Weighted Pruning Operation for Storage Applications

OPENALEX - Publications

D.H. Park Dongyun Kam Sangbu Yun Jeongwon Choe Youngjoo Lee

In this paper, we present a novel weighted pruning method that effectively reduces the processing latency of hard-decision (HD) polar decoder for storage applications without compromising error-correcting capability. Based on previous Fast-SSCL-SPC decoding algorithm, thoroughly analyze cause performance degradation when using HD inputs. Introducing operation least reliable internal value, proposed successfully avoids faulty updates codeword candidates. Furthermore, demonstrate architecture...

10.1109/tcsii.2024.3378204 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2024-03-18

A Design Framework for Cost-Efficient Sorters With Arbitrary Input/Output Constraints

OPENALEX - Publications

Jae-Hee Kim Sangil Han Dongyun Kam Byeong Yong Kong Youngjoo Lee

10.1109/tcsi.2024.3424450 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2024-07-15

Panacea: Novel DNN Accelerator using Accuracy-Preserving Asymmetric Quantization and Energy-Saving Bit-Slice Sparsity

OPENALEX - Publications

Dongyun Kam Myeongji Yun Sunwoo Yoo Seungwoo Hong Zhengya Zhang and 1 more

Low bit-precisions and their bit-slice sparsity have recently been studied to accelerate general matrix-multiplications (GEMM) during large-scale deep neural network (DNN) inferences. While the conventional symmetric quantization facilitates low-resolution processing with for both weight activation, its accuracy loss caused by activation's asymmetric distributions cannot be acceptable, especially DNNs. In efforts mitigate this loss, recent studies actively utilized activations without...

10.48550/arxiv.2412.10059 preprint EN arXiv (Cornell University) 2024-12-13

High-Throughput and Low-Latency Digital Baseband Architecture for Energy-Efficient Wireless VR Systems

OPENALEX - Publications

S. W. Hwang Seungsik Moon Dongyun Kam Inn‐Yeal Oh Youngjoo Lee

This paper presents a novel baseband architecture that supports high-speed wireless VR solutions using 60 GHz RF circuits. Based on the experimental observations by our previous transceiver circuits, efficient is proposed to enhance quality of transmission. To achieve zero-latency transmission, we define an (106,920, 95,040) interleaved-BCH error-correction code (ECC), which removes iterative processing steps in LDPC ECC standardized for near-field communication. Introducing block-level...

10.3390/electronics8070815 article EN Electronics 2019-07-22

An AC-3/MPEG multi-standard audio decoder IC

OPENALEX - Publications

Stephen H. Li J. Rowlands P. Ng Michael Gill D.S. Youm and 3 more

The emerging digital audio compression technology brings both an opportunity and a new challenge to IC design. High quality multichannel is quickly becoming indispensable part of entertainment system. algorithms used in the result complex VLSI ICs. work presented this paper about design dedicated, high precision, low cost AC3/MPEG multi-standard decoder. IC's hardware software architecture, as well simulation/verification methodology are discussed detail.

10.1109/cicc.1997.606622 article EN 2002-11-22

Energy-Efficient RISC-V-Based Vector Processor for Cache-Aware Structurally-Pruned Transformers

OPENALEX - Publications

Jung Gyu Min Dongyun Kam Younghoon Byun Gunho Park Youngjoo Lee

Based on recent RISC-V designs, we present in this paper a low-power vector processor architecture for efficiently deploying vision transformer (ViT) models. To fairly measure the processing efficiency of different designs with instruction/data cache memories, first develop evaluation framework based numerous design tools jointly considering algorithm, architecture, and circuit performances together, numerically revealing that previous CSR-based data compression cannot accelerate pruned...

10.1109/islped58423.2023.10244508 article EN 2023-08-07

Low-Complexity and Low-Latency SVC Decoding Architecture Using Modified MAP-SP Algorithm

OPENALEX - Publications

Seungwoo Hong Dongyun Kam Sangbu Yun Jeongwon Choe Namyoon Lee and 1 more

The compressive sensing (CS) based sparse vector coding (SVC) method is one of the promising ways for next-generation ultra-reliable and low-latency communications. In this paper, we present advanced algorithm-hardware co-optimization schemes realizing a cost-effective SVC decoding architecture. previous maximum posteriori subspace pursuit (MAP-SP) algorithm newly modified to relax computational overheads by applying novel residual forwarding LLR approximation schemes. A fully-pipelined...

10.1109/tcsi.2021.3136222 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2021-12-29

FPGA-Based Ordered Statistic Decoding Architecture for B5G/6G URLLC IIOT Networks

OPENALEX - Publications

Changhyeon Kim Dongyoung Rim Jeongwon Choe Dongyun Kam Giyoon Park and 2 more

The ordered statistic decoding (OSD) approach for short-length BCH codes has been continuously considered as one of the promising error-correction by achieving a block error rate (BLER) less than $10^{-6}$, which is attractive to ultra-reliable and low-latency communication (URLLC) industrial IoT (IIOT) solutions [1], [2]. However, it hard directly realize conventional OSD algorithm because compute-intensive Gaussian elimination iterative reprocessing steps. Based on recent segmentation...

10.1109/a-sscc53895.2021.9634714 article EN 2022 IEEE Asian Solid-State Circuits Conference (A-SSCC) 2021-11-07

Design and Evaluation Frameworks for Advanced RISC-based Ternary Processor

OPENALEX - Publications

Dongyun Kam Jung Gyu Min Jongho Yoon Sunmean Kim Seokhyeong Kang and 1 more

In this paper, we introduce the design and verification frameworks for developing a fully-functional emerging ternary processor. Based on existing compiling environments binary processors, given instructions, software-level framework provides an efficient way to convert programs assembly codes. We also present hardware-level rapidly evaluate performance of processor implemented in arbitrary technology. As case study, 9-trit advanced RISC-based (ART-9) core is newly developed by using...

10.48550/arxiv.2111.07584 preprint EN other-oa arXiv (Cornell University) 2021-01-01

GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks

OPENALEX - Publications

Minhoo Kang Ranggi Hwang Jiwon Lee Dongyun Kam Youngjoo Lee and 1 more

Graph convolutional neural networks (GCNs) have emerged as a key technology in various application domains where the input data is relational. A unique property of GCNs that its two primary execution stages, aggregation and combination, exhibit drastically different dataflows. Consequently, prior GCN accelerators tackle this research space by casting combination stages series sparse-dense matrix multiplication. However, work frequently suffers from inefficient movements, leaving significant...

10.48550/arxiv.2203.00158 preprint EN cc-by-nc-nd arXiv (Cornell University) 2022-01-01

Simplified Ordered Statistic Decoding for Short-Length Linear Block Codes

OPENALEX - Publications

Changhyeon Kim Dongyun Kam Seokki Kim Giyoon Park Youngjoo Lee

The ordered statistic decoding (OSD) algorithm for short-length linear block codes provides an attractive ML-approaching performance, expected to be used the ultra-reliable low latency communication (URLLC) at next-generation wireless solutions. To find corrected codeword among numerous candidates, however, process requires a considerable amount of computational costs, which need simplified achieve low-latency processing. In this letter, we present several schemes that relax overall...

10.1109/lcomm.2022.3176646 article EN IEEE Communications Letters 2022-05-20