NFDI4DS | UHH-SEMS - Publication Details

TrackFM: Far-out Compiler Support for a Far Memory World

OPENALEX - Publications

Brian R. Tauro Brian Suchy Simone Campanoni Peter A. Dinda Kyle C. Hale

Large memory workloads with favorable locality of reference can benefit by extending the hierarchy across machines. Systems that enable such far configurations improve application performance and overall utilization in a cluster. There are two current alternatives for software-based memory: kernel-based library-based. Kernel-based approaches sacrifice to achieve programmer transparency, while library-based transparency performance. We argue novel third approach, compiler-based which...

10.1145/3617232.3624856 article EN cc-by 2024-04-17

A Case for Transforming Parallel Runtimes Into Operating System Kernels

OPENALEX - Publications

Kyle C. Hale Peter A. Dinda

The needs of parallel runtime systems and the increasingly sophisticated languages compilers they support do not line up with services provided by general-purpose OSes. Furthermore, semantics available to are lost at system-call boundary in such Finally, because a executes user-level an environment, it cannot leverage hardware features that require kernel-mode privileges---a large portion functionality machine is it. These limitations warp design, implementation, functionality, performance...

10.1145/2749246.2749264 article EN 2015-06-08

Isolating functions at the hardware limit with virtines

OPENALEX - Publications

Nick Wanninger Joshua J. Bowden Kirtankumar Shetty Ayush Garg Kyle C. Hale

An important class of applications, including programs that leverage third-party libraries, use user-defined functions in databases, and serverless benefit from isolating the execution untrusted code at granularity individual or function invocations. However, existing isolation mechanisms were not designed for this case; rather, they have been adapted to it. We introduce virtines, a new abstraction specifically isolation, describe how we build virtines ground up by pushing hardware...

10.1145/3492321.3519553 preprint EN 2022-03-28

Enabling Hybrid Parallel Runtimes Through Kernel and Virtualization Support

OPENALEX - Publications

Kyle C. Hale Peter A. Dinda

In our hybrid runtime (HRT) model, a parallel system and the application are together transformed into specialized OS kernel that operates entirely in mode can thus implement exactly its desired abstractions on top of fully privileged hardware access. We describe design implementation two new tools support HRT model. The first, Nautilus Aerokernel, is framework specifically designed to enable HRTs for x64 Xeon Phi hardware. Aerokernel primitives creation operate much faster, up orders...

10.1145/2892242.2892255 article EN 2016-03-25

Segment gating for static energy reduction in Networks-on-Chip

OPENALEX - Publications

Kyle C. Hale Boris Grot Stephen W. Keckler

Chip multiprocessors (CMPs) have emerged as a primary vehicle for overcoming the limitations of uniprocessor scaling, with power constraints now representing key factor CMP design. Recent studies shown that on-chip interconnection network (NOC) can consume much 36% overall chip power. To date, researchers employed several techniques to reduce consumption in network, including use on/off links by means gating. However, many these target dynamic power, and those consider static focus...

10.1145/1645213.1645227 article EN 2009-12-12

Shifting GEARS to enable guest-context virtual services

OPENALEX - Publications

Kyle C. Hale Xia Lei Peter A. Dinda

We argue that the implementation of VMM-based virtual services for a guest should extend into itself, even without its cooperation. Placing service components directly OS or application can reduce complexity and increase performance. In this paper we show set tools in VMM required to enable broad range such guest-context is fairly small. Further, outline evaluate these describe their design context Guest Examination Revision Services (GEARS), new framework within Palacios VMM. then two...

10.1145/2371536.2371542 article EN 2012-09-18

An Evaluation of Asynchronous Software Events on Modern Hardware

OPENALEX - Publications

Kyle C. Hale Peter A. Dinda

Runtimes and applications that rely heavily on asynchronous event notifications suffer when such must traverse several layers of processing in software. Many these necessarily exist order to support a general-purpose, portable kernel architecture, but they introduce considerable overheads for demanding, high-performance parallel runtimes applications. Other can arise from mismatched programming or system call interface. Whatever the case, average latency variance commonly used software...

10.1109/mascots.2018.00041 article EN 2018-09-01

Task parallel assembly language for uncompromising parallelism

OPENALEX - Publications

Mike Rainey Ryan Newton Kyle C. Hale Nikos Hardavellas Simone Campanoni and 2 more

Achieving parallel performance and scalability involves making compromises between sequential computation. If not contained, the overheads of parallelism can easily outweigh its benefits, sometimes by orders magnitude. Today, we expect programmers to implement this compromise optimizing their code manually. This process is labor intensive, requires deep expertise, reduces quality. Recent work on heartbeat scheduling shows a promising approach that manifests potentially vast amounts...

10.1145/3453483.3460969 article EN 2021-06-18

Multiverse: Easy Conversion of Runtime Systems into OS Kernels via Automatic Hybridization

OPENALEX - Publications

Kyle C. Hale Conor Hetland Peter A. Dinda

The hybrid runtime (HRT) model offers a path towards high performance and efficiency. By integrating the OS kernel, runtime, application, an HRT allows developer to leverage full feature set of hardware specialize services runtime's needs. However, conforming currently requires port kernel level, for example Nautilus framework, this knowledge internals. In response, we developed Multiverse, system that bridges gap between built-from-scratch legacy system. Multiverse unmodified applications...

10.1109/icac.2017.24 preprint EN 2017-07-01

Automatic Hybridization of Runtime Systems

OPENALEX - Publications

Kyle C. Hale Conor Hetland Peter A. Dinda

The hybrid runtime (HRT) model offers a plausible path towards high performance and efficiency. By integrating the OS kernel, parallel runtime, application, an HRT allows developer to leverage full privileged feature set of hardware specialize services runtime's needs. However, conforming currently requires complete port application kernel level, for example our Nautilus framework, this knowledge internals. In response, we developed Multiverse, system that bridges gap between...

10.1145/2907294.2907309 article EN 2016-05-31

Memory Mapping and Parallelizing Random Forests for Speed and Cache Efficiency

OPENALEX - Publications

Eduardo Romero-Gainza Christopher Stewart Angela Li Kyle C. Hale Nathaniel Morris

Memory mapping enhances decision tree implementations by enabling constant-time statistical inference, and is particularly effective when memory mapped tables fit in processor cache. However, more challenging applied to random forests—ensembles of many trees—as the table sizes can easily outstrip cache capacity. We argue that careful system design for parallel efficiency make forests. Our preliminary results show memory-mapped forests speed up inference latency a factor 30 × .

10.1145/3458744.3474052 article EN 2021-08-09

ConCORD

OPENALEX - Publications

Xia Lei Kyle C. Hale Peter A. Dinda

We argue that memory content-tracking across the nodes of a parallel machine should be factored into distinct platform service on top which application services can built. ConCORD is proof-of-concept system we have developed and evaluated to test this claim. Our core insight many described as query over content. This leads concept in ConCORD, content-aware command architecture, an implemented parametrization single general knows how execute well. dynamically adapts execution amount...

10.1145/2600212.2600214 article EN 2014-06-20

Paths to OpenMP in the kernel

OPENALEX - Publications

Jiacheng Ma Wenyi Wang Aaron Nelson Michael Cuevas Brian Homerding and 5 more

OpenMP implementations make increasing demands on the kernel. We take next step and consider bringing into Our vision is that entire application, run-time system, a kernel framework interwoven to become kernel, allowing implementation full advantage of hardware in custom manner. compare contrast three approaches achieving this goal. The first, runtime (RTK), ports any code use pragmas. second, process (PIK) adds specialized abstraction for running user-level within third, compilation (CCK),...

10.1145/3458817.3476183 article EN 2021-10-21

VMM emulation of Intel hardware transactional memory

OPENALEX - Publications

Maciej Swiech Kyle C. Hale Peter A. Dinda

We describe the design, implementation, and evaluation of emulated hardware transactional memory, specifically Intel Haswell Restricted Transactional Memory (RTM) architectural extensions for x86/64, within a virtual machine monitor (VMM). Our system allows users to investigate RTM on that does not provide it, debug their RTM-based software, stress test it diverse configurations, including potential future configurations might support arbitrary length transactions. Initial performance...

10.1145/2612262.2612265 article EN 2014-06-10

A Look at Communication-Intensive Performance in Julia

OPENALEX - Publications

Amal Rizvi Kyle C. Hale

The Julia programming language continues to gain popularity both for its potential programmer productivity and impressive performance on scientific code. It thus holds large-scale HPC, but we have not yet seen this fully realized. While certainly has the machinery run at scale, while others done so embarrassingly parallel workloads, see an analysis of Julia's communication-intensive codes that are common in HPC domain. In paper investigate light, first with a suite microbenchmarks within...

10.48550/arxiv.2109.14072 preprint EN cc-by-nc-nd arXiv (Cornell University) 2021-01-01

Hearing loss impacts aviator performance and cognitive workload during simulated flight

OPENALEX - Publications

Heath G. Jones Paula Henry Jennifer Noetzel Kyle C. Hale Kichol Lee and 1 more

Hearing loss can render an aviator more susceptible to the adverse effects of degraded communication signals and consequently lead increased allocation mental resources hear (referred as listening effort). Army aviation hearing standards, which are primarily based on pure tone speech recognition test scores in quiet environments, do not necessarily predict functional impact loss. The has recently adopted a new Military Operational Test (MOHT) assess current study aimed validate MOHT,...

10.1121/10.0022692 article EN The Journal of the Acoustical Society of America 2023-10-01

Bolt

OPENALEX - Publications

Fernando Ezquer Christopher Stewart Angela Li Kyle C. Hale Nathaniel Morris

Random forests use ensembles of decision trees to boost accuracy for machine learning tasks. However, large slow down inference on platforms that process each tree in an ensemble individually. We present Bolt, a platform restructures whole random forests, not just individual trees, speed up inference. Conceptually, Bolt maps every path lookup table which, if cache were enough, would allow with one memory access. When the size exceeds capacity, employs novel combination lossless compression,...

10.1145/3528535.3531519 article EN 2022-05-31

Towards a Practical Ecosystem of Specialized OS Kernels

OPENALEX - Publications

Conghao Liu Kyle C. Hale

Specialized operating systems have enjoyed a recent revival driven both by pressing need to rethink the system software stack in several domains and convenience flexibility that on-demand infrastructure virtual execution environments offer. Several barriers exist which curtail widespread adoption of such highly specialized systems, but perhaps most consequential them is these are simply difficult use. In this paper we discuss challenges faced OSes, for HPC more broadly, argue what needed...

10.1145/3322789.3328742 article EN 2019-06-17

Prospects for Functional Address Translation

OPENALEX - Publications

Conor Hetland Georgios Tziantzioulis Brian Suchy Kyle C. Hale Nikos Hardavellas and 1 more

Address translation fundamentally embodies a function that maps from virtual to physical addresses. In current systems, the is encoded by kernel in an in-memory radix tree structure (the page table hierarchy) which then interpreted hardware pagewalker, pagewalk-caches, and TLBs). We consider implementing itself as reconfigurable hardware-does this make any sense? To study question, we collected numerous in-situ Linux tables for wide range of workloads, including those HPC, serve example...

10.1109/mascots.2019.00047 article EN 2019-09-25

Enabling Hybrid Parallel Runtimes Through Kernel and Virtualization Support

OPENALEX - Publications

Kyle C. Hale Peter A. Dinda

In our hybrid runtime (HRT) model, a parallel system and the application are together transformed into specialized OS kernel that operates entirely in mode can thus implement exactly its desired abstractions on top of fully privileged hardware access. We describe design implementation two new tools support HRT model. The first, Nautilus Aerokernel, is framework specifically designed to enable HRTs for x64 Xeon Phi hardware. Aerokernel primitives creation operate much faster, up orders...

10.1145/3007611.2892255 article EN ACM SIGPLAN Notices 2016-03-25

Playing Fetch with CAT

OPENALEX - Publications

Qitian Zeng Kyle C. Hale Boris Glavic

Software prefetching and hardware-based cache allocation techniques (CAT) have been successfully applied in main-memory database engines to fetch data into before it is needed partition a shared last-level (LLC) prevent concurrent tasks from evicting each others' data. We investigate the interaction of these demonstrate that while single strategy sufficient, combination both only effective if partitioning adapts based on types currently sharing an LLC. present simple, yet effective, scheme...

10.1145/3465998.3466016 article EN 2021-06-18

Enabling Extremely Fine-grained Parallelism via Scalable Concurrent Queues on Modern Many-core Architectures

OPENALEX - Publications

Poornima Nookala Peter A. Dinda Kyle C. Hale Kyle Chard Ioan Raicu

Enabling efficient fine-grained task parallelism is a significant challenge for hardware platforms with increasingly many cores. Existing techniques do not scale to hundreds of threads due the high cost synchronization in concurrent data structures. To overcome these limitations we present XQueue, novel lock-less queuing system relaxed ordering semantics that geared towards realizing scalability up threads. We demonstrate XQueue using microbenchmarks and show can deliver operations latencies...

10.1109/mascots53633.2021.9614292 article EN 2021-11-03

Modeling Speedup in Multi-OS Environments

OPENALEX - Publications

Brian R. Tauro Conghao Liu Kyle C. Hale

For workloads that place strenuous demands on system software, novel operating designs like unikernels, library OSes, and hybrid runtimes offer a promising path forward. However, while these systems can outperform general-purpose they have limited ability to support legacy applications. Multi-OS environments, where the application's execution is split between compute plane data system, address this challenge, but reasoning about performance of applications run in such environment currently...

10.1109/mascots.2019.00044 article EN 2019-09-25