Bradley W. Settlemyer

ORCID: 0000-0002-9299-2654
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Data Storage Technologies
  • Distributed and Parallel Computing Systems
  • Cloud Computing and Resource Management
  • Caching and Content Delivery
  • Distributed systems and fault tolerance
  • Parallel Computing and Optimization Techniques
  • Peer-to-Peer Network Technologies
  • Software-Defined Networks and 5G
  • Interconnection Networks and Systems
  • Scientific Computing and Data Management
  • Network Traffic and Congestion Control
  • Advanced Optical Network Technologies
  • Simulation Techniques and Applications
  • Advanced Database Systems and Queries
  • Software System Performance and Reliability
  • Real-time simulation and control systems
  • Cloud Data Security Solutions
  • Experimental Learning in Engineering
  • Data Management and Algorithms
  • Network Security and Intrusion Detection
  • Mobile Agent-Based Network Management
  • Mobile Ad Hoc Networks
  • Particle Detector Development and Performance
  • Embedded Systems Design Techniques
  • Particle Accelerators and Free-Electron Lasers

Nvidia (United Kingdom)
2024

Nvidia (United States)
2022-2024

Los Alamos National Laboratory
2015-2021

Carnegie Mellon University
2021

Centre for High Performance Computing
2017

Oak Ridge National Laboratory
2010-2015

Clemson University
2008-2009

Understanding workload characteristics is critical for optimizing and improving the performance of current systems software, architecting new storage based on observed patterns. In this paper, we characterize scientific workloads world's fastest HPC (High Performance Computing) cluster, Spider, at Oak Ridge Leadership Computing Facility (OLCF). Spider provides an aggregate bandwidth over 240 GB/s with 10 petabytes RAID 6 formatted capacity. OLCFs flagship petascale simulation platform,...

10.1109/pdsw.2010.5668066 article EN 2010-11-01

High performance computing fault tolerance depends on scalable parallel file system performance. For more than a decade bandwidth has been available from the object storage systems that underlie modern systems, and recently we have seen demonstrations of metadata using dynamic partitioning namespace over multiple servers. But even these require significant numbers dedicated servers, some workloads still experience bottlenecks. We envision exascale do not any server machines. Instead job...

10.1145/2834976.2834984 article EN 2015-11-11

In this paper we look at the performance characteristics of three tools used to move large data sets over dedicated long distance networking infrastructure. Although studies wide area networks have been a frequent topic interest, analyses tended focus on network latency and peak throughput using traffic generators. study instead perform an end-to-end analysis that includes reading from source file system committing remote destination system. An evaluation movement is also configurations...

10.1109/msst.2011.5937236 article EN 2011-05-01

Analysis of large-scale simulation output is a core element scientific inquiry, but analysis queries may experience significant I/O overhead when the data not structured for efficient retrieval. While in-situ processing allows improved time-to-insight many applications, scaling frameworks to hundreds thousands cores can be difficult in practice. The DeltaFS indexing new approach massive amounts achieve point and small-range queries. This paper describes challenges lessons learned this...

10.1109/sc.2018.00006 article EN 2018-11-01

Ceph is an emerging open-source parallel distributed file and storage system. By design, leverages unreliable commodity network hardware, provides reliability fault-tolerance via controlled object placement data replication. This paper presents our block I/O performance scalability evaluation of for scientific high-performance computing (HPC) environments. Our work makes two unique contributions. First, performed under a realistic setup large-scale capability HPC environment using commercial...

10.1145/2538542.2538562 article EN 2013-11-15

File transfers over dedicated connections, supported by large parallel file systems, have become increasingly important in high-performance computing and big data workflows. It remains a challenge to achieve peak rates for such due the complexities of I/O, host, network transport subsystems, equally importantly, their interactions. We present extensive measurements disk-to-disk using Lustre XFS systems mounted on multi-core servers suite 10 Gbps emulated connections with 0-366 ms round trip...

10.1109/hpcc-smartcity-dss.2016.0038 article EN 2016-12-01

In recent years, non-volatile memory devices such as SSD drives have emerged a viable storage solution due to their increasing capacity and decreasing cost. Due the unique capability requirements in large scale HPC (High Performance Computing) environment, hybrid configuration (SSD HDD) may represent one of most available balanced solutions considering cost performance. Under this setting, effective data placement well movement with controlled overhead become pressing challenge. paper, we...

10.1109/msst.2014.6855552 article EN 2014-06-01

The trend in parallel computing toward clusters running thousands of cooperating processes per application has led to an I/O bottleneck that only gotten more severe as the CPU density increased. Current file systems provide large amounts aggregate bandwidth; however, they do not achieve high degrees metadata scalability required manage files distributed across hundreds or storage nodes. In this paper we examine use collective communication between servers improve operations. particular,...

10.5555/1413370.1413377 article EN IEEE International Conference on High Performance Computing, Data, and Analytics 2008-11-15

This paper presents our design for an asynchronous object storage system intended use in scientific and commercial big data workloads. Use cases from the target workload domains are used to motivate key abstractions application programming interface (API). The architecture of Scalable Object Store (SOS), a prototype that supports API's facilities, is presented. SOS serves as vehicle future research into scalable resilient storage. We briefly review providing efficient servers capable quality...

10.1145/2538542.2538565 article EN 2013-11-15

The transfer of big data is increasingly supported by dedicated channels in high-performance networks. Transport protocols play a critical role maximizing the link utilization such high-speed connections. We propose Profile Generator (TPG) to characterize and enhance end-to-end throughput performance transport protocols. TPG automates tuning various transport-related parameters including socket options protocol-specific configurations, supports multiple streams NIC-to-NIC To instantiate...

10.1109/iccnc.2015.7069458 article EN 2016 International Conference on Computing, Networking and Communications (ICNC) 2015-02-01

Modern high performance computing platforms employ burst buffers to overcome the I/O bottleneck that limits scale and efficiency of large-scale parallel computations. Currently there are two competing buffer architectures. One is treat as a dedicated shared resource, The other integrate hardware into each compute node. In this paper we examine design tradeoffs associated with local shared, architectures through modeling. By seeding our simulation realistic workloads, able systematically...

10.5555/3108096.3108100 article EN High Performance Computing Symposium 2017-04-23

Key–value (KV) software has proven useful to a wide variety of applications including analytics, time-series databases, and distributed file systems. To satisfy the requirements diverse workloads, KV stores have been carefully tailored best match performance characteristics underlying solid-state block devices. Emerging storage device is promising technology for both simplifying stack improving persistent storage-based applications. However, while providing fast, predictable put get...

10.1145/3582013 article EN ACM Transactions on Storage 2023-01-21

Popular software key-value stores such as LevelDB and RocksDB are often tailored for efficient writing. Yet, they tend to also perform well on read operations. This is because while data initially stored in a format that favors writes, it later transformed by the DB background into better accommodates reads. Write-optimized can still block writes. happens when those workers cannot keep up with foreground insertion workload.This paper advocates hardware-accelerated store, enabling...

10.1109/cluster52292.2023.00019 article EN 2023-10-31

With the end of Dennard scaling, specializing and distributing compute engines throughout system is a promising technique to improve applications performance. For example, NVIDIA's BlueField Data Processing Unit (DPU) integrates programmable processing elements within network offers specialized capabilities. These capabilities enable communication via offloads onto DPUs present new application opportunities for offloading nonblocking or complex patterns such as collective operations. This...

10.23919/isc.2024.10528935 article EN 2024-05-01

Long haul data transfers require the optimization and balancing of performances host storage systems as well network transport. An assessment such transport methods requires a systematic generation throughput profiles from measurements collected over different system parameters connection lengths. We describe to support wide-area I/O at 10 Gbps, present memory disk transfer throughputs suites physical emulated connections several thousands miles. The are limited by infrastructure incur...

10.1109/iccnc.2012.6167544 article EN 2016 International Conference on Computing, Networking and Communications (ICNC) 2012-01-01

In recent years, NAND flash-based solid state drives (SSD) have been widely used in datacenters due to their better performance compared with the traditional hard disk drives. However, little is known about reliability characteristics of SSDs production systems. Existing works study statistical distributions SSD failures field. they do not go deep into and investigate unique error types health dynamics that distinguish from this paper, we explore SSD-specific SMART (Self-Monitoring,...

10.1109/bigdata.2018.8622643 article EN 2021 IEEE International Conference on Big Data (Big Data) 2018-12-01

We introduce La-pdes, a parameterized benchmark application for measuring parallel and serial discrete event simulation (PDES) performance. Applying holistic view of PDES system performance, La-pdes tests the performance factors (i) (P)DES engine in terms queue efficiency, synchronization mechanism, load-balancing schemes; (ii) available hardware handling computationally intensive loads, memory size, cache hierarchy, clock speed; (iii) interaction with communication middleware (often MPI)...

10.5555/2888619.2888945 article EN Winter Simulation Conference 2015-12-06

The trend in parallel computing toward clusters running thousands of cooperating processes per application has led to an I/O bottleneck that only gotten more severe as the CPU density increased. Current file systems provide large amounts aggregate bandwidth; however, they do not achieve high degrees metadata scalability required manage files distributed across hundreds or storage nodes. In this paper we examine use collective communication between servers improve operations. particular,...

10.1109/sc.2008.5214724 article EN 2008-11-01

High-performance computing (HPC) storage systems rely on access coordination to ensure that concurrent updates do not produce incoherent results. HPC typically employ pessimistic distributed locking provide this functionality in cases where applications cannot perform their own coordination. This approach, however, introduces significant performance overhead and complicates fault handling. In work we evaluate the viability of optimistic conditional operations as an alternative systems. We...

10.1109/sc.companion.2012.19 article EN 2012-11-01

Wide-area memory transfers between on-going computations and remote steering, analysis visualization sites can be utilized in several High-Performance Computing (HPC) scenarios. Dedicated network connections with high capacity, low loss rates competing traffic, are typically provisioned over current HPC infrastructures to support such transfers. To gain insights into transfers, we collected throughput measurements for different versions of TCP dedicated multi-core servers emulated 10 Gbps...

10.1109/hpcc-css-icess.2015.86 article EN 2015-08-01

Recent developments in software-defined infrastructures promise that scientific workflows utilizing supercomputers, instruments, and storage systems will be dynamically composed orchestrated using software at unprecedented speed scale the near future. Testing of underlying networking software, particularly during initial exploratory stages, remains a challenge due to potential disruptions, resource allocation coordination needed over multi-domain physical infrastructure. To overcome these...

10.1145/3217197.3217202 article EN 2018-06-07

Modern high performance computing platforms employ burst buffers to overcome the I/O bottleneck that limits scale and efficiency of large-scale parallel computations. Currently there are two competing buffer architectures. One is treat as a dedicated shared resource, The other integrate hardware into each compute node. In this paper we examine design tradeoffs associated with local shared, architectures through modeling. By seeding our simulation realistic workloads, able systematically...

10.22360/springsim.2017.hpc.009 article EN 2017-01-01

In this paper we introduce the Indexed Massive Directory, a new technique for indexing data within DeltaFS. With its design as scalable, server-less file system HPC platforms, DeltaFS scales metadata performance with application scale. The Directory is novel extension to plane, enabling in-situ of massive amounts written single directory simultaneously, and in an arbitrarily large number files. We achieve through memory-efficient mechanism reordering data, log-structured storage layout pack...

10.1145/3149393.3149398 article EN 2017-11-03
Coming Soon ...