NFDI4DS | UHH-SEMS - Publication Details

Chansup Byun

ORCID: 0009-0003-0183-914X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5087896681

Research Areas

Cloud Computing and Resource Management
Distributed and Parallel Computing Systems
Advanced Data Storage Technologies
Computational Fluid Dynamics and Aerodynamics
Parallel Computing and Optimization Techniques
Advanced Numerical Methods in Computational Mathematics
Complex Network Analysis Techniques
Network Security and Intrusion Detection
Scientific Computing and Data Management
Fluid Dynamics and Turbulent Flows
Graph Theory and Algorithms
Caching and Content Delivery
Numerical methods for differential equations
Model Reduction and Neural Networks
IoT and Edge/Fog Computing
Internet Traffic Analysis and Secure E-voting
Peer-to-Peer Network Technologies
Composite Structure Analysis and Optimization
Computational Physics and Python Applications
Anomaly Detection Techniques and Applications
Network Traffic and Congestion Control
Data Management and Algorithms
Algorithms and Data Compression
Fluid Dynamics and Vibration Analysis
Advanced Database Systems and Queries

MIT Lincoln Laboratory
2014-2024

Massachusetts Institute of Technology
2016-2024

Moscow Institute of Thermal Technology
2021-2023

University of Oklahoma
2005

Ames Research Center
1993-1999

Search for Extraterrestrial Intelligence
1994

Virginia Tech
1992-1993

Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis

OPENALEX - Publications

Albert Reuther Jeremy Kepner Chansup Byun Siddharth Samsi William Arcand and 13 more

Interactive massively parallel computations are critical for machine learning and data analysis. These a staple of the MIT Lincoln Laboratory Supercomputing Center (LLSC) has required LLSC to develop unique interactive supercomputing capabilities. Scaling frameworks, such as TensorFlow, analysis environments, MATLAB/Octave, tens thousands cores presents many technical challenges - in particular, rapidly dispatching tasks through scheduler, Slurm, starting instances applications with...

10.1109/hpec.2018.8547629 preprint EN 2018-09-01

Scalable system scheduling for HPC and big data

OPENALEX - Publications

Albert Reuther Chansup Byun William Arcand David Bestor Bill Bergeron and 6 more

10.1016/j.jpdc.2017.06.009 article EN publisher-specific-oa Journal of Parallel and Distributed Computing 2017-08-08

Dynamic distributed dimensional data model (D4M) database and computation system

OPENALEX - Publications

Jeremy Kepner William Arcand William Bergeron Nadya Bliss Robert Bond and 11 more

A crucial element of large web companies is their ability to collect and analyze massive amounts data. Tuple store databases are a key enabling technology employed by many these (e.g., Google Big Table Amazon Dynamo). stores highly scalable run on commodity clusters, but lack interfaces support efficient development mathematically based analytics. D4M (Dynamic Distributed Dimensional Data Model) has been developed provide rich interface tuple (and structured query language "SQL" databases)....

10.1109/icassp.2012.6289129 article EN 2012-03-01

D4M: Bringing associative arrays to database engines

OPENALEX - Publications

Vijay Gadepally Jeremy Kepner William Arcand David Bestor Bill Bergeron and 9 more

The ability to collect and analyze large amounts of data is a growing problem within the scientific community. gap between users calls for innovative tools that address challenges faced by big volume, velocity variety. Numerous exist allow store, query index these massive quantities data. Each storage or database engine comes with promise dealing complex Scientists engineers who wish use systems often quickly find there no single technology offers panacea complexity information. When using...

10.1109/hpec.2015.7322472 preprint EN 2015-09-01

Achieving 100,000,000 database inserts per second using Accumulo and D4M

OPENALEX - Publications

Jeremy Kepner William Arcand David Bestor Bill Bergeron Chansup Byun and 8 more

The Apache Accumulo database is an open source relaxed consistency that widely used for government applications. designed to deliver high performance on unstructured data such as graphs of network data. This paper tests the using from Graph500 benchmark. Dynamic Distributed Dimensional Data Model (D4M) software implement benchmark a 216-node cluster running MIT SuperCloud stack. A peak over 100,000,000 inserts per second was achieved which 100x larger than highest previously published value...

10.1109/hpec.2014.7040945 preprint EN 2014-09-01

Fluid-structural interactions using Navier-Stokes flow equations coupled with shell finite element structures

OPENALEX - Publications

Guru P. Guruswamy Chansup Byun

A computational procedure is presented to study fluid-structural interaction problems for three-dimensional aerospace structures. The flow modeled using the unsteady Euler/Navier-Stokes equations and solved finite-difference approach. three dimensional structure shell/plate finite-element formulation. two disciplines are coupled a domain decomposition Accurate procedures both in time space developed combine solutions from with those of structural equations. Time accuracy maintained...

10.2514/6.1993-3087 article EN 23rd Fluid Dynamics, Plasmadynamics, and Lasers Conference 1993-07-06

D4M 2.0 schema: A general purpose high performance schema for the Accumulo database

OPENALEX - Publications

Jeremy Kepner Christian N. K. Anderson William Arcand David Bestor Bill Bergeron and 9 more

Non-traditional, relaxed consistency, triple store databases are the backbone of many web companies (e.g., Google Big Table, Amazon Dynamo, and Facebook Cassandra). The Apache Accumulo database is a high performance open source consistency that widely used for government applications. Obtaining full benefits requires using novel schemas. Dynamic Distributed Dimensional Data Model (D4M)[http://www.mit.edu/~kepner/D4M] provides uniform mathematical framework based on associative arrays...

10.1109/hpec.2013.6670318 preprint EN 2013-09-01

Learning by doing, High Performance Computing education in the MOOC era

OPENALEX - Publications

Julia Mullen Chansup Byun Vijay Gadepally Siddharth Samsi Albert Reuther and 1 more

10.1016/j.jpdc.2017.01.015 article EN Journal of Parallel and Distributed Computing 2017-01-17

AI-Enabling Workloads on Large-Scale GPU-Accelerated System: Characterization, Opportunities, and Implications

OPENALEX - Publications

Baolin Li Rohin Arora Siddharth Samsi Tirthak Patel William Arcand and 25 more

Production high-performance computing (HPC) systems are adopting and integrating GPUs into their design to accommodate artificial intelligence (AI), machine learning, data visualization workloads. To aid with the operations of new existing GPU-based large-scale systems, we provide a detailed characterization system operations, job characteristics, user behavior, trends on contemporary GPU-accelerated production HPC system. Our insights indicate that pre-mature phases in modern AI workflow...

10.1109/hpca53966.2022.00093 article EN 2022-04-01

Driving big data with big compute

OPENALEX - Publications

Chansup Byun William Arcand David Bestor Bill Bergeron Matthew Hubbell and 9 more

Big Data (as embodied by Hadoop clusters) and Compute MPI provide unique capabilities for storing processing large volumes of data. clusters make distributed computing readily accessible to the Java community high parallel efficiency compute intensive workloads. Bringing big data communities together is an active area research. The LLGrid team has developed deployed a number technologies that aim best both worlds. MapReduce allows map/reduce programming model be used quickly efficiently in...

10.1109/hpec.2012.6408678 article EN 2012-09-01

LLSuperCloud: Sharing HPC systems for diverse rapid prototyping

OPENALEX - Publications

Albert Reuther Jeremy Kepner William Arcand David Bestor Bill Bergeron and 6 more

The supercomputing and enterprise computing arenas come from very different lineages. However, the advent of commodity servers has brought two closer than they have ever been. Within computing, resulted in development a wide range new cloud capabilities: elastic virtualization, data hosting. Similarly, community developed capabilities heterogeneous, massively parallel hardware software. Merging benefits clouds been challenging goal. Significant effort expended trying to deploy on systems....

10.1109/hpec.2013.6670329 article EN 2013-09-01

Measuring the Impact of Spectre and Meltdown

OPENALEX - Publications

Andrew Prout William Arcand David Bestor Bill Bergeron Chansup Byun and 13 more

The Spectre and Meltdown flaws in modern microprocessors represent a new class of attacks that have been difficult to mitigate. mitigations proposed known performance impacts. reported magnitude these impacts varies depending on the industry sector expected workload characteristics. In this paper, we measure impact several workloads relevant HPC systems. We show can be significant both synthetic realistic workloads. also penalties are avoid even dedicated systems where security is lesser concern.

10.1109/hpec.2018.8547554 preprint EN 2018-09-01

The MIT Supercloud Dataset

OPENALEX - Publications

Siddharth Samsi Matthew L. Weiss David Bestor Baolin Li Michael Jones and 22 more

Artificial intelligence (AI) and Machine learning (ML) workloads are an increasingly larger share of the compute in traditional High-Performance Computing (HPC) centers commercial cloud systems. This has led to changes deployment approaches HPC clusters cloud, as well a new focus on optimized resource usage, allocations AI frameworks, capabilities such Jupyter notebooks enable rapid prototyping deployment. With these changes, there is need better understand cluster/datacenter operations with...

10.1109/hpec49654.2021.9622850 article EN 2021-09-20

Enabling on-demand database computing with MIT SuperCloud database management system

OPENALEX - Publications

Andrew Prout Jeremy Kepner Peter Michaleas William Arcand David Bestor and 9 more

The MIT SuperCloud database management system allows for rapid creation and flexible execution of a variety the latest scientific databases, including Apache Accumulo SciDB. It is designed to permit these databases run on High Performance Computing Cluster (HPCC) platform as seamlessly any other HPCC job. ensures seamless migration resources assigned by scheduler centralized storage files when not running. also permits snapshotting allow researchers experiment push limits technology without...

10.1109/hpec.2015.7322482 preprint EN 2015-09-01

Prediction of interlaminar stresses in laminated plates using globalorthogonal interpolation polynomials

OPENALEX - Publications

Chansup Byun Rakesh K. Kapania

A postprocessor for displacement-based finite element solutions of laminated plates under transverse loads is developed to obtain the resulting interlaminar stresses. The can be used solution that has been obtained using either classical lamination plate theory or first-order shear deformation theory. equilibrium equations elasticity are integrated directly These include influence products in-plane stresses and out-of-plane rotations thus geometrically nonlinear problems. To accurately...

10.2514/3.11293 article EN AIAA Journal 1992-11-01

Big Data strategies for Data Center Infrastructure management using a 3D gaming platform

OPENALEX - Publications

Matthew Hubbell Andrew Moran William Arcand David Bestor Bill Bergeron and 9 more

High Performance Computing (HPC) is intrinsically linked to effective Data Center Infrastructure Management (DCIM). Cloud services and HPC have become key components in Department of Defense corporate Information Technology competitive strategies the global commercial spaces. As a result, reliance on consistent, reliable space more critical than ever. The costs complexity providing quality DCIM are constantly being tested evaluated by United States Government companies such as Google,...

10.1109/hpec.2015.7322471 preprint EN 2015-09-01

LLMapReduce: Multi-level map-reduce for high performance data analysis

OPENALEX - Publications

Chansup Byun Jeremy Kepner William Arcand David Bestor Bill Bergeron and 8 more

The map-reduce parallel programming model has become extremely popular in the big data community. Many workloads can benefit from enhanced performance offered by supercomputers. LLMapReduce provides familiar to users running on a supercomputer. dramatically simplifies providing simple capability one line of code. supports all languages and many schedulers. work with any application without need modify application. Furthermore, overcome scaling limits via options that allow user switch more...

10.1109/hpec.2016.7761618 preprint EN 2016-09-01

Scheduler technologies in support of high performance data analysis

OPENALEX - Publications

Albert Reuther Chansup Byun William Arcand David Bestor Bill Bergeron and 6 more

Job schedulers are a key component of scalable computing infrastructures. They orchestrate all the work executed on infrastructure and directly impact effectiveness system. Recently, job workloads have diversified from long-running, synchronously-parallel simulations to include short-duration, independently parallel high performance data analysis (HPDA) jobs. Each these types requires different features scheduler tuning run efficiently. A number been developed address both workload system...

10.1109/hpec.2016.7761604 preprint EN 2016-09-01

75,000,000,000 Streaming Inserts/Second Using Hierarchical Hypersparse GraphBLAS Matrices

OPENALEX - Publications

Jeremy Kepner Tim D. Davis Chansup Byun William Arcand David Bestor and 14 more

The SuiteSparse GraphBLAS C-library implements high performance hypersparse matrices with bindings to a variety of languages (Python, Julia, and Matlab/Octave). provides lightweight in-memory database implementation that are ideal for analyzing many types network data, while providing rigorous mathematical guarantees, such as linearity. Streaming updates put enormous pressure on the memory hierarchy. This work benchmarks an hierarchical reduces dramatically increases update rate into...

10.1109/ipdpsw50202.2020.00046 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2020-05-01

Hyperscaling Internet Graph Analysis with D4M on the MIT SuperCloud

OPENALEX - Publications

Vijay Gadepally Jeremy Kepner Lauren Milechin William Arcand David Bestor and 12 more

Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of traffic. For example, 10 Gigabit Ethernet connection can generate over 50 MB/s packet headers. global providers, this be amplified by many orders magnitude. Development novel computer analytics requires: high level programming environments, massive amount capture (PCAP) data, diverse data products for "at scale" algorithm pipeline development. D4M (Dynamic Distributed Dimensional Data...

10.1109/hpec.2018.8547552 article EN 2018-09-01

PageRank Pipeline Benchmark: Proposal for a Holistic System Benchmark for Big-Data Platforms

OPENALEX - Publications

Patrick Dreher Chansup Byun Chris Hill Vijay Gadepally Bradley C. Kuszmaul and 1 more

The rise of big data systems has created a need for benchmarks to measure and compare the capabilities these systems. Big present unique scalability challenges. supercomputing community wrestled with challenges decades developed methodologies creating rigorous scalable (e.g., HPC Challenge). proposed PageRank pipeline benchmark employs benchmarking create that is reflective many real-world processing builds on existing prior (Graph500, Sort, PageRank) holistic multiple integrated kernels can...

10.1109/ipdpsw.2016.89 preprint EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2016-05-01

Scalability of VM provisioning systems

OPENALEX - Publications

Mike Jones Bill Arcand Bill Bergeron David Bestor Chansup Byun and 11 more

Virtual machines and virtualized hardware have been around for over half a century. The commoditization of the x86 platform its rapidly growing capabilities led to recent exponential growth in use virtualization both enterprise high performance computing (HPC). startup time environment is key metric which runtime any individual task typically much shorter than lifetime service an context. In this paper, methodology accurately measuring on HPC system described. overhead three most mature,...

10.1109/hpec.2016.7761629 preprint EN 2016-09-01

Benchmarking data analysis and machine learning applications on the Intel KNL many-core processor

OPENALEX - Publications

Chansup Byun Jeremy Kepner William Arcand David Bestor Bill Bergeron and 13 more

Knights Landing (KNL) is the code name for second-generation Intel Xeon Phi product family. KNL has generated significant interest in data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The vector processor design enables it to exploit much higher levels parallelism. At Lincoln Laboratory Supercomputing Center (LLSC), majority users are running applications such as MATLAB Octave. More recently, applications, UC Berkeley Caffe...

10.1109/hpec.2017.8091067 preprint EN 2017-09-01

Direct coupling of Euler flow equations with plate finite element structures

OPENALEX - Publications

Guru P. Guruswamy Chansup Byun

A procedure to compute aeroelasticity by directly coupling the Euler equations for fluids and with plate finite element structures is presented. The coupled are solved using a time integration method. accuracy maintained moving grids that conform aeroelastically deformed shape computed every step. aeordynamic forces transferred simple lumped load approach also more accurate virtual surface

10.2514/3.12378 article EN AIAA Journal 1995-02-01

Coming Soon ...