NFDI4DS | UHH-SEMS - Publication Details

A scalable architecture for ordered parallelism

OPENALEX - Publications

Mark C. Jeffrey Suvinay Subramanian Cong Yan Joel Emer Daniel Sánchez

We present Swarm, a novel architecture that exploits ordered irregular parallelism, which is abundant but hard to mine with current software and hardware techniques. In this architecture, programs consist of short tasks programmer-specified timestamps. Swarm executes speculatively out order, efficiently speculates thousands ahead the earliest active task uncover parallelism. builds on prior TLS HTM schemes, contributes several new techniques allow it scale large core counts speculation...

10.1145/2830772.2830777 article EN 2015-12-05

Unlocking Ordered Parallelism with the Swarm Architecture

OPENALEX - Publications

Mark C. Jeffrey Suvinay Subramanian Cong Yan Joel Emer Daniel Sánchez

The authors present Swarm, a parallel architecture that exploits ordered parallelism, which is abundant but hard to mine with current software and hardware techniques. Swarm programs consist of short tasks, as small tens instructions each, programmer-specified order constraints. executes tasks speculatively out efficiently speculates thousands ahead the earliest active task uncover enough parallelism. Several techniques allow scale large core counts speculation windows. evaluate on graph...

10.1109/mm.2016.12 article EN IEEE Micro 2016-03-18

T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware

OPENALEX - Publications

Victor A. Ying Mark C. Jeffrey Daniel Sánchez

Multicores are now ubiquitous, but programmers still write sequential code. Speculative parallelization is an enticing approach to parallelize code while retaining the ease of programming, making parallelism pervasive. However, prior speculative parallelizing compilers and architectures achieved limited speedups due high costs recovering from misspeculation hardware scalability bottlenecks. We present T4, a compiler that successfully leverages recent features for execution, which new...

10.1109/isca45697.2020.00024 article EN 2020-05-01

Data-centric execution of speculative parallel programs

OPENALEX - Publications

Mark C. Jeffrey Suvinay Subramanian Maleen Abeydeera Joel Emer Daniel Sánchez

Multicore systems must exploit locality to scale, scheduling tasks minimize data movement. While locality-aware parallelism is well studied in non-speculative systems, it has received little attention speculative (e.g., HTM or TLS), which hinders their scalability. We present spatial hints, a technique that leverages program knowledge reveal and parallel programs. A hint an abstract integer, given when task created, denotes the likely access. show easy modify programs convey through hints....

10.1109/micro.2016.7783708 article EN 2016-10-01

Data-centric execution of speculative parallel programs

OPENALEX - Publications

Mark C. Jeffrey Suvinay Subramanian Maleen Abeydeera Joel Emer Daniel Sánchez

Multicore systems must exploit locality to scale, scheduling tasks minimize data movement. While locality-aware parallelism is well studied in non-speculative systems, it has received little attention speculative (e.g., HTM or TLS), which hinders their scalability. We present spatial hints, a technique that leverages program knowledge reveal and parallel programs. A hint an abstract integer, given when task created, denotes the likely access. show easy modify programs convey through hints....

10.5555/3195638.3195644 article EN International Symposium on Microarchitecture 2016-10-15

Fractal

OPENALEX - Publications

Suvinay Subramanian Mark C. Jeffrey Maleen Abeydeera Hyun Ryong Lee Victor A. Ying and 2 more

Most systems that support speculative parallelization, like hardware transactional memory (HTM), do not nested parallelism. This sacrifices substantial parallelism and precludes composing parallel algorithms. And the few HTMs focus on parallelizing at coarsest (shallowest) levels, incurring large overheads squander most of their potential.

10.1145/3079856.3080218 article EN 2017-06-24

Harmonizing Speculative and Non-Speculative Execution in Architectures for Ordered Parallelism

OPENALEX - Publications

Mark C. Jeffrey Victor A. Ying Suvinay Subramanian Hyun Ryong Lee Joel Emer and 1 more

Multicore systems should support both speculative and non-speculative parallelism. Speculative parallelism is easy to use crucial scale many challenging applications, while more efficient allows parallel irrevocable actions (e.g., I/O). Unfortunately, prior techniques are far from this goal. Hardware transactional memory (HTM) (transactional) (non-transactional) work, but lack coordination mechanisms between the two, limited unordered Prior work has extended HTMs avoid limitations of...

10.1109/micro.2018.00026 article EN 2018-10-01

When Is Parallelism Fearless and Zero-Cost with Rust?

OPENALEX - Publications

Javad Abdi Gilead Posluns G. Zhang Boxuan Wang Mark C. Jeffrey

10.1145/3626183.3659966 article EN 2024-06-04

Understanding bloom filter intersection for lazy address-set disambiguation

OPENALEX - Publications

Mark C. Jeffrey J. Gregory Steffan

A Bloom filter is a probabilistic bit-array-based set representation that has recently been applied to address-set disambiguation in systems ease the burden of parallel programming. However, many these intersect bit-arrays approximate intersection and decide disjointness. This contrast with conventional well-studied approach making individual membership queries into filter. In this paper we present much-needed models for unconventional application testing disjointness using filters....

10.1145/1989493.1989551 article EN 2011-06-04

ALTOCUMULUS: Scalable Scheduling for Nanosecond-Scale Remote Procedure Calls

OPENALEX - Publications

Jiechen Zhao Iris Uwizeyimana Karthik Ganesan Mark C. Jeffrey Natalie Enright Jerger

Online services in modern datacenters use Remote Procedure Calls (RPCs) to communicate between different software layers. Despite RPCs using just a few small functions, inefficient RPC handling can cause delays propagate across the system and degrade end-to-end performance. Prior work has reduced processing time less than 1 $\mu$ s, which now shifts bottleneck scheduling of RPCs. Existing schedulers suffer from either high overheads, inability effectively utilize core-count CPUs or do not...

10.1109/micro56248.2022.00039 article EN 2022-10-01

SAM: Optimizing Multithreaded Cores for Speculative Parallelism

OPENALEX - Publications

Maleen Abeydeera Suvinay Subramanian Mark C. Jeffrey Joel Emer Daniel Sánchez

This work studies the interplay between multithreaded cores and speculative parallelism (e.g., transactional memory or thread-level speculation). These techniques are often used together, yet they have been developed independently. disconnect causes major performance pathologies: increasing number of threads per core adds conflicts wasted work, puts pressure on execution resources. pathologies squander benefits multithreading.We present speculation-aware multithreading (SAM), a simple policy...

10.1109/pact.2017.37 article EN 2017-09-01

Application-specific signatures for transactional memory in soft processors

OPENALEX - Publications

Martin Labrecque Mark C. Jeffrey J. Gregory Steffan

As reconfigurable computing hardware and in particular FPGA-based systems-on-chip comprise an increasing number of processor accelerator cores, supporting sharing synchronization a way that is scalable easy to program becomes challenge. Transactional Memory (TM) potential solution this problem, system provides the opportunity support TM (HTM). Although there are many proposed approaches HTM for ASICs, these do not necessarily map well FPGAs. In work we demonstrate while signature -based...

10.1145/2000832.2000833 article EN ACM Transactions on Reconfigurable Technology and Systems 2011-08-01

Fractal

OPENALEX - Publications

Suvinay Subramanian Mark C. Jeffrey Maleen Abeydeera Hyun Ryong Lee Victor A. Ying and 2 more

Most systems that support speculative parallelization, like hardware transactional memory (HTM), do not nested parallelism. This sacrifices substantial parallelism and precludes composing parallel algorithms. And the few HTMs focus on parallelizing at coarsest (shallowest) levels, incurring large overheads squander most of their potential. We present FRACTAL, a new execution model supports unordered timestamp-ordered FRACTAL lets programmers seamlessly compose algorithms, architecture...

10.1145/3140659.3080218 article EN ACM SIGARCH Computer Architecture News 2017-06-24

A scalable architecture for reprioritizing ordered parallelism

OPENALEX - Publications

Gilead Posluns Yan Zhu Guowei Zhang Mark C. Jeffrey

Many algorithms schedule their work, or tasks, according to a priority order for correctness faster convergence. While schedulers commonly implement task enqueue and dequeueMin operations, some need update operation that alters the scheduling metadata task. Prior software hardware systems support with updates compromise on either parallelism, work-efficiency, both, leading missed performance opportunities. Moreover, incorrectly navigating these compromises violates in those are not resilient...

10.1145/3470496.3527387 article EN 2022-05-31

CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery

OPENALEX - Publications

Kiwan Maeng Shivam Bharuka Isabel Gao Mark C. Jeffrey Vikram Saraph and 6 more

The paper proposes and optimizes a partial recovery training system, CPR, for recommendation models. CPR relaxes the consistency requirement by enabling non-failed nodes to proceed without loading checkpoints when node fails during training, improving failure-related overheads. is first extent of our knowledge perform data-driven, in-depth analysis applying models identified trade-off between accuracy performance. Motivated analysis, we present system that can reduce time maintain desired...

10.48550/arxiv.2011.02999 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Multi Bucket Queues: Efficient Concurrent Priority Scheduling

OPENALEX - Publications

G. Zhang Gilead Posluns Mark C. Jeffrey

10.1145/3626183.3659962 article EN 2024-06-04

Disintegrating Manycores

OPENALEX - Publications

Isidor R. Brkić Mark C. Jeffrey

The economics of Moore's Law are stumbling, so vendors many-core architectures transitioning from single-die monolithic designs to multi-chiplet disintegrated systems within a package. Disintegration lowers cost for the same number cores but bottlenecks interconnect. Ideally, disintegration should increase performance per dollar: savings outweigh slowdown. Although industry has reported savings, penalty is not well studied.

10.1145/3610396.3618090 article EN 2023-10-12

Brief Announcement: Is the Problem-Based Benchmark Suite Fearless with Rust?

OPENALEX - Publications

Javad Abdi Guowei Zhang Mark C. Jeffrey

Rust aims to combine safety and performance claims provide fearless concurrency. We present a case study evaluate the extent which makes parallel programming by porting programs from C++-based PBBS benchmark suite Rust. with Rayon provides fearlessness for regular parallelism but not irregular parallelism. introduce Rusty-PBBS: Rust-based both

10.1145/3558481.3591313 article EN 2023-05-31

Session details: Session 1C: Design Automation, Synthesis, Hardware Generation

OPENALEX - Publications

Mark C. Jeffrey

No abstract available.

10.1145/3637182 article FR 2023-10-28

Program Committee

OPENALEX - Publications

Akanksha Jain Alex Hankin Alex Zinenko Amrita Google Ananda Mazumdar and 32 more

10.1109/iiswc55918.2022.00008 article EN 2022-11-01