Mark C. Jeffrey

ORCID: 0000-0003-4816-0356
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Distributed systems and fault tolerance
  • Cloud Computing and Resource Management
  • Distributed and Parallel Computing Systems
  • Advanced Data Storage Technologies
  • Interconnection Networks and Systems
  • Embedded Systems Design Techniques
  • Caching and Content Delivery
  • Optimization and Search Problems
  • Intelligent Tutoring Systems and Adaptive Learning
  • Advanced Neural Network Applications
  • Network Packet Processing and Optimization
  • Adversarial Robustness in Machine Learning
  • Stochastic Gradient Optimization Techniques
  • Computability, Logic, AI Algorithms
  • Privacy-Preserving Technologies in Data
  • Real-Time Systems Scheduling
  • Graph Theory and Algorithms
  • Explainable Artificial Intelligence (XAI)

University of Toronto
2011-2024

MIT University
2020

Massachusetts Institute of Technology
2016-2018

We present Swarm, a novel architecture that exploits ordered irregular parallelism, which is abundant but hard to mine with current software and hardware techniques. In this architecture, programs consist of short tasks programmer-specified timestamps. Swarm executes speculatively out order, efficiently speculates thousands ahead the earliest active task uncover parallelism. builds on prior TLS HTM schemes, contributes several new techniques allow it scale large core counts speculation...

10.1145/2830772.2830777 article EN 2015-12-05

The authors present Swarm, a parallel architecture that exploits ordered parallelism, which is abundant but hard to mine with current software and hardware techniques. Swarm programs consist of short tasks, as small tens instructions each, programmer-specified order constraints. executes tasks speculatively out efficiently speculates thousands ahead the earliest active task uncover enough parallelism. Several techniques allow scale large core counts speculation windows. evaluate on graph...

10.1109/mm.2016.12 article EN IEEE Micro 2016-03-18

Multicores are now ubiquitous, but programmers still write sequential code. Speculative parallelization is an enticing approach to parallelize code while retaining the ease of programming, making parallelism pervasive. However, prior speculative parallelizing compilers and architectures achieved limited speedups due high costs recovering from misspeculation hardware scalability bottlenecks. We present T4, a compiler that successfully leverages recent features for execution, which new...

10.1109/isca45697.2020.00024 article EN 2020-05-01

Multicore systems must exploit locality to scale, scheduling tasks minimize data movement. While locality-aware parallelism is well studied in non-speculative systems, it has received little attention speculative (e.g., HTM or TLS), which hinders their scalability. We present spatial hints, a technique that leverages program knowledge reveal and parallel programs. A hint an abstract integer, given when task created, denotes the likely access. show easy modify programs convey through hints....

10.1109/micro.2016.7783708 article EN 2016-10-01

Multicore systems must exploit locality to scale, scheduling tasks minimize data movement. While locality-aware parallelism is well studied in non-speculative systems, it has received little attention speculative (e.g., HTM or TLS), which hinders their scalability. We present spatial hints, a technique that leverages program knowledge reveal and parallel programs. A hint an abstract integer, given when task created, denotes the likely access. show easy modify programs convey through hints....

10.5555/3195638.3195644 article EN International Symposium on Microarchitecture 2016-10-15

Most systems that support speculative parallelization, like hardware transactional memory (HTM), do not nested parallelism. This sacrifices substantial parallelism and precludes composing parallel algorithms. And the few HTMs focus on parallelizing at coarsest (shallowest) levels, incurring large overheads squander most of their potential.

10.1145/3079856.3080218 article EN 2017-06-24

Multicore systems should support both speculative and non-speculative parallelism. Speculative parallelism is easy to use crucial scale many challenging applications, while more efficient allows parallel irrevocable actions (e.g., I/O). Unfortunately, prior techniques are far from this goal. Hardware transactional memory (HTM) (transactional) (non-transactional) work, but lack coordination mechanisms between the two, limited unordered Prior work has extended HTMs avoid limitations of...

10.1109/micro.2018.00026 article EN 2018-10-01

A Bloom filter is a probabilistic bit-array-based set representation that has recently been applied to address-set disambiguation in systems ease the burden of parallel programming. However, many these intersect bit-arrays approximate intersection and decide disjointness. This contrast with conventional well-studied approach making individual membership queries into filter. In this paper we present much-needed models for unconventional application testing disjointness using filters....

10.1145/1989493.1989551 article EN 2011-06-04

Online services in modern datacenters use Remote Procedure Calls (RPCs) to communicate between different software layers. Despite RPCs using just a few small functions, inefficient RPC handling can cause delays propagate across the system and degrade end-to-end performance. Prior work has reduced processing time less than 1 $\mu$ s, which now shifts bottleneck scheduling of RPCs. Existing schedulers suffer from either high overheads, inability effectively utilize core-count CPUs or do not...

10.1109/micro56248.2022.00039 article EN 2022-10-01

This work studies the interplay between multithreaded cores and speculative parallelism (e.g., transactional memory or thread-level speculation). These techniques are often used together, yet they have been developed independently. disconnect causes major performance pathologies: increasing number of threads per core adds conflicts wasted work, puts pressure on execution resources. pathologies squander benefits multithreading.We present speculation-aware multithreading (SAM), a simple policy...

10.1109/pact.2017.37 article EN 2017-09-01

As reconfigurable computing hardware and in particular FPGA-based systems-on-chip comprise an increasing number of processor accelerator cores, supporting sharing synchronization a way that is scalable easy to program becomes challenge. Transactional Memory (TM) potential solution this problem, system provides the opportunity support TM (HTM). Although there are many proposed approaches HTM for ASICs, these do not necessarily map well FPGAs. In work we demonstrate while signature -based...

10.1145/2000832.2000833 article EN ACM Transactions on Reconfigurable Technology and Systems 2011-08-01

Most systems that support speculative parallelization, like hardware transactional memory (HTM), do not nested parallelism. This sacrifices substantial parallelism and precludes composing parallel algorithms. And the few HTMs focus on parallelizing at coarsest (shallowest) levels, incurring large overheads squander most of their potential. We present FRACTAL, a new execution model supports unordered timestamp-ordered FRACTAL lets programmers seamlessly compose algorithms, architecture...

10.1145/3140659.3080218 article EN ACM SIGARCH Computer Architecture News 2017-06-24

Many algorithms schedule their work, or tasks, according to a priority order for correctness faster convergence. While schedulers commonly implement task enqueue and dequeueMin operations, some need update operation that alters the scheduling metadata task. Prior software hardware systems support with updates compromise on either parallelism, work-efficiency, both, leading missed performance opportunities. Moreover, incorrectly navigating these compromises violates in those are not resilient...

10.1145/3470496.3527387 article EN 2022-05-31

The paper proposes and optimizes a partial recovery training system, CPR, for recommendation models. CPR relaxes the consistency requirement by enabling non-failed nodes to proceed without loading checkpoints when node fails during training, improving failure-related overheads. is first extent of our knowledge perform data-driven, in-depth analysis applying models identified trade-off between accuracy performance. Motivated analysis, we present system that can reduce time maintain desired...

10.48550/arxiv.2011.02999 preprint EN other-oa arXiv (Cornell University) 2020-01-01

The economics of Moore's Law are stumbling, so vendors many-core architectures transitioning from single-die monolithic designs to multi-chiplet disintegrated systems within a package. Disintegration lowers cost for the same number cores but bottlenecks interconnect. Ideally, disintegration should increase performance per dollar: savings outweigh slowdown. Although industry has reported savings, penalty is not well studied.

10.1145/3610396.3618090 article EN 2023-10-12

Rust aims to combine safety and performance claims provide fearless concurrency. We present a case study evaluate the extent which makes parallel programming by porting programs from C++-based PBBS benchmark suite Rust. with Rayon provides fearlessness for regular parallelism but not irregular parallelism. introduce Rusty-PBBS: Rust-based both

10.1145/3558481.3591313 article EN 2023-05-31
Coming Soon ...