NFDI4DS | UHH-SEMS - Publication Details

Jongsoo Park

ORCID: 0000-0002-4750-9440

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5101876582

Research Areas

Parallel Computing and Optimization Techniques
Catalysts for Methane Reforming
Advanced Data Storage Technologies
Stochastic Gradient Optimization Techniques
Membrane Separation and Gas Transport
Catalytic Processes in Materials Science
Advanced Neural Network Applications
Distributed and Parallel Computing Systems
Ammonia Synthesis and Nitrogen Reduction
Recommender Systems and Techniques
Interconnection Networks and Systems
Embedded Systems Design Techniques
Neural Networks and Applications
Cloud Computing and Resource Management
Tensor decomposition and applications
Carbon Dioxide Capture Technologies
Matrix Theory and Algorithms
Generative Adversarial Networks and Image Synthesis
Algorithms and Data Compression
Advanced Data Compression Techniques
Advanced Graph Neural Networks
Hydrogen Storage and Materials
Advanced SAR Imaging Techniques
Advanced Image and Video Retrieval Techniques
Aeroelasticity and Vibration Control

Alpha Omega Alpha Medical Honor Society
2023

Menlo School
2021-2023

BC Platforms (Finland)
2022

Meta (United States)
2017-2022

Yonsei University
2021

Intel (Germany)
2018

Intel (United States)
2012-2018

Meta (Israel)
2018

Korea Institute of Energy Research
2005-2017

Intel (United Kingdom)
2012-2016

Deep Learning Recommendation Model for Personalization and Recommendation Systems

OPENALEX - Publications

Maxim Naumov Dheevatsa Mudigere Hao-Jun Michael Shi Jianyu Huang Narayanan Sundaraman and 19 more

With the advent of deep learning, neural network-based recommendation models have emerged as an important tool for tackling personalization and tasks. These networks differ significantly from other learning due to their need handle categorical features are not well studied or understood. In this paper, we develop a state-of-the-art model (DLRM) provide its implementation in both PyTorch Caffe2 frameworks. addition, design specialized parallelization scheme utilizing parallelism on embedding...

10.48550/arxiv.1906.00091 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Glow: Graph Lowering Compiler Techniques for Neural Networks

OPENALEX - Publications

Nadav Rotem Jordan Fix Saleem Abdulrasool Summer Deng Roman Dzhabarov and 8 more

This paper presents the design of Glow, a machine learning compiler for heterogeneous hardware. It is pragmatic approach to compilation that enables generation highly optimized code multiple targets. Glow lowers traditional neural network dataflow graph into two-phase strongly-typed intermediate representation. The high-level representation allows optimizer perform domain-specific optimizations. lower-level instruction-based address-only memory-related optimizations, such as instruction...

10.48550/arxiv.1805.00907 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Software-hardware co-design for fast and scalable training of deep learning recommendation models

OPENALEX - Publications

Dheevatsa Mudigere Yuchen Hao Jianyu Huang Zhihao Jia Andrew Tulloch and 48 more

Deep learning recommendation models (DLRMs) have been used across many business-critical services at Meta and are the single largest AI application in terms of infrastructure demand its data-centers. In this paper, we present Neo, a software-hardware co-designed system for high-performance distributed training large-scale DLRMs. Neo employs novel 4D parallelism strategy that combines table-wise, row-wise, column-wise, data massive embedding operators addition, enables extremely...

10.1145/3470496.3533727 preprint EN 2022-05-31

Navigating the maze of graph analytics frameworks using massive graph datasets

OPENALEX - Publications

Nadathur Satish Narayanan Sundaram Md. Mostofa Ali Patwary Jiwon Seo Jongsoo Park and 4 more

Graph algorithms are becoming increasingly important for analyzing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed towards few items. Implementing traversal, statistics and machine learning on such scalable manner quite challenging. As result, several analytics frameworks (GraphLab, CombBLAS, Giraph, SociaLite Galois among others) have been developed, each offering solution with different programming models targeted at...

10.1145/2588555.2610518 article EN 2014-06-18

Efficient Embedded Computing

OPENALEX - Publications

William J. Dally James Balfour David Black-Shaffer James L. Chen Ralf-Christian Härting and 3 more

Hardwired ASICs - 50X more efficient than programmable processors sacrifice programmability to meet the efficiency requirements of demanding embedded systems. Programmable use energy mostly supply instructions and data arithmetic units, several techniques can reduce instruction- data-supply costs. Using these in Stanford ELM processor closes gap with within 3X.

10.1109/mc.2008.224 article EN Computer 2008-07-01

A Study of BFLOAT16 for Deep Learning Training

OPENALEX - Publications

Dhiraj D. Kalamkar Dheevatsa Mudigere Naveen Mellempudi Dipankar Das Kunal Banerjee and 14 more

This paper presents the first comprehensive empirical study demonstrating efficacy of Brain Floating Point (BFLOAT16) half-precision format for Deep Learning training across image classification, speech recognition, language modeling, generative networks and industrial recommendation systems. BFLOAT16 is attractive two reasons: range values it can represent same as that IEEE 754 floating-point (FP32) conversion to/from FP32 simple. Maintaining important to ensure no hyper-parameter tuning...

10.48550/arxiv.1905.12322 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Faster CNNs with Direct Sparse Convolutions and Guided Pruning

OPENALEX - Publications

Jongsoo Park Sheng Li Wei Wen Ping Tang Hai Li and 2 more

Phenomenally successful in practical inference problems, convolutional neural networks (CNN) are widely deployed mobile devices, data centers, and even supercomputers. The number of parameters needed CNNs, however, often large undesirable. Consequently, various methods have been developed to prune a CNN once it is trained. Nevertheless, the resulting CNNs offer limited benefits. While pruning fully connected layers reduces CNN's size considerably, does not improve speed noticeably as compute...

10.48550/arxiv.1608.01409 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Distributed socialite

OPENALEX - Publications

Jiwon Seo Jongsoo Park Jaeho Shin Monica S. Lam

Large-scale graph analysis is becoming important with the rise of world-wide social network services. Recently in SociaLite, we proposed extensions to Datalog efficiently and succinctly implement programs on sequential machines. This paper describes novel optimizations SociaLite for parallel distributed executions support large-scale analysis. With programmers simply annotate how data are be distributed, then necessary communication automatically inferred generate code cluster multi-core It...

10.14778/2556549.2556572 article EN Proceedings of the VLDB Endowment 2013-09-01

Straightforward Synthesis of Hierarchically Porous Nitrogen-Doped Carbon via Pyrolysis of Chitosan/Urea/KOH Mixtures and Its Application as a Support for Formic Acid Dehydrogenation Catalysts

OPENALEX - Publications

Dong-Wook Lee Min-Ho Jin Duck-Kyu Oh Sungwook Lee Jongsoo Park

The development of cheap, simple, and green synthetic methods for hierarchically porous nitrogen-doped carbon, especially derived from renewable biomass, such as chitosan, remains a challenging topic. Here, we first synthesized carbon (KIE-8) having graphene-like structure via simple pyrolysis chitosan/urea/KOH mixture without any conventional sophisticated treatments, freeze-drying, hydrothermal carbonization, soft or hard templating. On the basis various analyses KIE-8, demonstrated that...

10.1021/acssuschemeng.7b01888 article EN ACS Sustainable Chemistry & Engineering 2017-09-27

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

OPENALEX - Publications

Jongsoo Park Maxim Naumov Protonu Basu Summer Deng Aravind Kalaiah and 23 more

The application of deep learning techniques resulted in remarkable improvement machine models. In this paper provides detailed characterizations models used many Facebook social network services. We present computational characteristics our models, describe high performance optimizations targeting existing systems, point out their limitations and make suggestions for the future general-purpose/accelerated inference hardware. Also, we highlight need better co-design algorithms, numerics...

10.48550/arxiv.1811.09886 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Two-step approach to scheduling quantum circuits

OPENALEX - Publications

Gian Giacomo Guerreschi Jongsoo Park

As the effort to scale up existing quantum hardware proceeds, it becomes necessary schedule gates in a way that minimizes number of operations. There are three constraints have be satisfied: order or dependency specific algorithm, fact any qubit may involved at most one gate time, and restriction two-qubit implementable only between connected qubits. The last aspect implies compilation depends not on but also properties like connectivity. Here we suggest two-step approach which logical...

10.1088/2058-9565/aacf0b article EN Quantum Science and Technology 2018-06-26

Advanced nickel metal catalyst for water–gas shift reaction

OPENALEX - Publications

Kyung-Ran Hwang Chun-Boo Lee Jongsoo Park

10.1016/j.jpowsour.2010.08.084 article EN Journal of Power Sources 2010-09-03

Combined steam and CO2 reforming of methane using catalytic nickel membrane for gas to liquid (GTL) process

OPENALEX - Publications

Shin‐Kun Ryi Sungwook Lee Jin-Woo Park Duck-Kyu Oh Jongsoo Park and 1 more

10.1016/j.cattod.2013.11.001 article EN Catalysis Today 2013-12-04

Enabling Sparse Winograd Convolution by Native Pruning

OPENALEX - Publications

Sheng R. Li Jongsoo Park Ping Tang

Sparse methods and the use of Winograd convolutions are two orthogonal approaches, each which significantly accelerates convolution computations in modern CNNs. merges these thus has potential to offer a combined performance benefit. Nevertheless, training layers so that resulting kernels sparse not hitherto been very successful. By introducing layer place standard layer, we can learn prune coefficients "natively" obtain sparsity level beyond 90% with only 0.1% accuracy loss AlexNet on...

10.48550/arxiv.1702.08597 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Efficient Shared-Memory Implementation of High-Performance Conjugate Gradient Benchmark and its Application to Unstructured Matrices

OPENALEX - Publications

Jongsoo Park Mikhail Smelyanskiy Karthikeyan Vaidyanathan Alexander Heinecke Dhiraj D. Kalamkar and 4 more

A new sparse high performance conjugate gradient benchmark (HPCG) has been recently released to address challenges in the design of linear solvers for next generation extreme-scale computing systems. Key computation, data access, and communication pattern HPCG represent building blocks commonly found today's HPC applications. While it is a well known challenge efficiently parallelize Gauss-Seidel smoother, most time-consuming kernel HPCG, our algorithmic architecture-aware optimizations...

10.1109/sc.2014.82 article EN 2014-11-01

Preparation, characterization and performance evaluations of thin film composite hollow fiber membrane for energy generation

OPENALEX - Publications

Pravin G. Ingole Wook Choi Kee‐Hong Kim Hang‐Dae Jo Won-Kil Choi and 2 more

10.1016/j.desal.2014.04.025 article EN Desalination 2014-05-17

Performance optimizations for scalable implicit RANS calculations with SU2

OPENALEX - Publications

Thomas D. Economon Dheevatsa Mudigere Gaurav Bansal Alexander Heinecke Francisco Palacios and 4 more

10.1016/j.compfluid.2016.02.003 article EN Computers & Fluids 2016-02-14

Automating wavefront parallelization for sparse matrix computations

OPENALEX - Publications

Anand Venkat Mahdi Soltan Mohammadi Jongsoo Park Hongbo Rong Rajkishore Barik and 2 more

This paper presents a compiler and runtime framework for parallelizing sparse matrix computations that have loop-carried dependences. Our approach automatically generates inspector to collect data dependence information achieves wavefront parallelization of the computation, where iterations within execute in parallel, synchronization is required across wavefronts. A key contribution this involves simplification, which reduces time space overhead inspector. implemented polyhedral framework,...

10.5555/3014904.3014959 article EN 2016-11-13

An Energy-Efficient Processor Architecture for Embedded Systems

OPENALEX - Publications

James Balfour William J. Dally David Black-Schaffer V. Parikh Jongsoo Park

We present an efficient programmable architecture for compute-intensive embedded applications. The processor uses instruction registers to reduce the cost of delivering instructions, and a hierarchical distributed data register organization deliver data. Instruction capture reuse locality in inexpensive storage structures that arc located near functional units. captures different levels hierarchy Exposed communication resources eliminate pipeline control logic, allow compiler schedule...

10.1109/l-ca.2008.1 article EN IEEE Computer Architecture Letters 2008-01-01

Water-gas shift reaction in a plate-type Pd-membrane reactor over a nickel metal catalyst

OPENALEX - Publications

Kyung-Ran Hwang Sungwook Lee Shin‐Kun Ryi Dong‐Kook Kim Taehwan Kim and 1 more

A plate-type catalytic membrane reactor (PCMR) was prepared for the water-gas shift (WGS) reaction. The nickel metal catalyst with a disk-shape placed on disk-type without cage or mesh to hold in reactor. WGS reaction PCMR experimentally investigated using simulated feed from coal gasification as function of pressure (up 1.1 MPa) and GHSV 20 000 h− 1). stronger adsorption CO Pd seems be responsible greater reduction hydrogen permeating flux, which more power inhibitor than steam. When S/C =...

10.1016/j.fuproc.2012.07.013 article EN cc-by-nc-nd Fuel Processing Technology 2012-08-09

Coming Soon ...