NFDI4DS | UHH-SEMS - Publication Details

System-Level Performance Metrics for Multiprogram Workloads

OPENALEX - Publications

Stijn Eyerman Lieven Eeckhout

Assessing the performance of multiprogram workloads running on multithreaded hardware is difficult because it involves a balance between single-program and overall system performance. This article argues for developing metrics in top-down fashion starting from system-level objectives. The authors propose two metrics: average normalized turnaround time, user-oriented metric, throughput, system-oriented metric.

10.1109/mm.2008.44 article EN IEEE Micro 2008-05-01

An Evaluation of High-Level Mechanistic Core Models

OPENALEX - Publications

Trevor E. Carlson Wim Heirman Stijn Eyerman Ibrahim Hur Lieven Eeckhout

Large core counts and complex cache hierarchies are increasing the burden placed on commonly used simulation modeling techniques. Although analytical models provide fast results, they do not apply to complex, many-core shared-memory systems. In contrast, detailed cycle-level can be accurate but also tends slow, which limits number of configurations that evaluated. A middle ground is needed provides for processors while still providing results. this article, we explore, analyze, compare...

10.1145/2629677 article EN ACM Transactions on Architecture and Code Optimization 2014-08-25

A mechanistic performance model for superscalar out-of-order processors

OPENALEX - Publications

Stijn Eyerman Lieven Eeckhout Tejas S. Karkhanis James E. Smith

A mechanistic model for out-of-order superscalar processors is developed and then applied to the study of microarchitecture resource scaling. The divides execution time into intervals separated by disruptive miss events such as branch mispredictions cache misses. Each type event results in characterizable performance behavior interval. By considering an interval's length (measured instructions), can be predicted Overall determined aggregating over all intervals. provides several advantages...

10.1145/1534909.1534910 article EN ACM Transactions on Computer Systems 2009-05-01

A performance counter architecture for computing accurate CPI components

OPENALEX - Publications

Stijn Eyerman Lieven Eeckhout Tejas S. Karkhanis James E. Smith

A common way of representing processor performance is to use Cycles per Instruction (CPI) `stacks' which break into a baseline CPI plus number individual miss event components. stacks can be very helpful in gaining insight the behavior an application on given microprocessor; consequently, they are widely used by software developers and computer architects. However, computing superscalar out-of-order processors challenging because various overlaps among execution events (cache misses, TLB...

10.1145/1168857.1168880 article EN 2006-10-20

Fine-grained DVFS using on-chip regulators

OPENALEX - Publications

Stijn Eyerman Lieven Eeckhout

Limit studies on Dynamic Voltage and Frequency Scaling (DVFS) provide apparently contradictory conclusions. On the one hand early limit report that DVFS is effective at large timescales (on order of million(s) cycles) with scaling overheads tens microseconds), they conclude there no need for small overhead timescales. Recent work other hand—motivated by surge on-chip voltage regulator research—explores potential fine-grained reports substantial energy savings hundreds cycles (while assuming...

10.1145/1952998.1952999 article EN ACM Transactions on Architecture and Code Optimization 2011-02-05

Interval simulation: Raising the level of abstraction in architectural simulation

OPENALEX - Publications

Davy Genbrugge Stijn Eyerman Lieven Eeckhout

Detailed architectural simulators suffer from a long development cycle and extremely evaluation times. This longstanding problem is further exacerbated in the multi-core processor era. Existing solutions address simulation by either sampling simulated instruction stream or mapping models on FPGAs; these approaches achieve substantial speedups while simulating performance cycle-accurate manner. paper proposes interval which takes completely different approach: raises level of abstraction...

10.1109/hpca.2010.5416636 article EN 2010-01-01

Modeling critical sections in Amdahl's law and its implications for multicore design

OPENALEX - Publications

Stijn Eyerman Lieven Eeckhout

This paper presents a fundamental law for parallel performance: it shows that performance is not only limited by sequential code (as suggested Amdahl's law) but also fundamentally synchronization through critical sections. Extending software model to include sections, we derive the surprising result impact of sections on can be modeled as completely part and part. The determined probability entering section contention (i.e., multiple threads wanting enter same section). reveals at least...

10.1145/1815961.1816011 article EN 2010-06-19

Criticality stacks

OPENALEX - Publications

Kristof Du Bois Stijn Eyerman Jennifer B. Sartor Lieven Eeckhout

Analyzing multi-threaded programs is quite challenging, but necessary to obtain good multicore performance while saving energy. Due synchronization, certain threads make others wait, because they hold a lock or have yet reach barrier. We call these critical threads, i.e., whose determinative of program as whole. Identifying can reveal numerous optimization opportunities, for the software developer and hardware.

10.1145/2485922.2485966 article EN 2013-06-23

Analytical Processor Performance and Power Modeling using Micro-Architecture Independent Characteristics

OPENALEX - Publications

Sam Van den Steen Stijn Eyerman Sander De Pestel Moncef Mechri Trevor E. Carlson and 3 more

Optimizing processors for (a) specific application(s) can substantially improve energy-efficiency. With the end of Dennard scaling, and corresponding reduction in energy-efficiency gains from technology such approaches may become increasingly important. However, designing application-specific requires fast design space exploration tools to optimize targeted application(s). Analytical models be a good fit as they provide performance power estimates insight into interaction between an...

10.1109/tc.2016.2547387 article EN IEEE Transactions on Computers 2016-01-01

Per-thread cycle accounting in SMT processors

OPENALEX - Publications

Stijn Eyerman Lieven Eeckhout

This paper proposes a cycle accounting architecture for Simultaneous Multithreading (SMT) processors that estimates the execution times each of threads had they been executed alone, while are running simultaneously on SMT processor. is done by to either base, miss event or waiting component during multi-threaded execution. Single-threaded alone time then estimated as sum base and components; represents lost count due The incurs reasonable hardware cost (around 1KB storage) single-threaded...

10.1145/1508244.1508260 article EN 2009-03-07

A Counter Architecture for Online DVFS Profitability Estimation

OPENALEX - Publications

Stijn Eyerman Lieven Eeckhout

Dynamic voltage and frequency scaling (DVFS) is a well known effective technique for reducing power consumption in modern microprocessors. An important concern though to estimate its profitability terms of performance energy. Current DVFS estimation approaches, however, lack accuracy or incur runtime and/or energy overhead. This paper proposes counter architecture online on superscalar out-of-order processors. The teases apart the fraction execution time that susceptible clock versus...

10.1109/tc.2010.65 article EN IEEE Transactions on Computers 2010-03-19

Probabilistic job symbiosis modeling for SMT processor scheduling

OPENALEX - Publications

Stijn Eyerman Lieven Eeckhout

Symbiotic job scheduling boosts simultaneous multithreading (SMT) processor performance by co-scheduling jobs that have `compatible' demands on the processor's shared resources. Existing approaches however require a sampling phase, evaluate limited number of possible co-schedules, use heuristics to gauge symbiosis, are rigid in their optimization target, and do not preserve system-level priorities/shares. This paper proposes probabilistic symbiosis modeling, which predicts whether will...

10.1145/1735970.1736033 article EN ACM SIGARCH Computer Architecture News 2010-03-05

Probabilistic job symbiosis modeling for SMT processor scheduling

OPENALEX - Publications

Stijn Eyerman Lieven Eeckhout

Symbiotic job scheduling boosts simultaneous multithreading (SMT) processor performance by co-scheduling jobs that have `compatible' demands on the processor's shared resources. Existing approaches however require a sampling phase, evaluate limited number of possible co-schedules, use heuristics to gauge symbiosis, are rigid in their optimization target, and do not preserve system-level priorities/shares.

10.1145/1736020.1736033 article EN 2010-03-13

Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications

OPENALEX - Publications

Stijn Eyerman Kristof Du Bois Lieven Eeckhout

Multi-threaded workloads typically show sublinear speedup on multi-core hardware, i.e., the achieved is not proportional to number of cores and threads. Sublinear scaling may have multiple causes, such as poorly scalable synchronization leading spinning and/or yielding, interference in shared resources last-level cache (LLC) well main memory subsystem. It vital for programmers processor designers understand bottlenecks existing emerging order optimize application performance design future...

10.1109/ispass.2012.6189221 article EN 2012-04-01

Bottle graphs

OPENALEX - Publications

Kristof Du Bois Jennifer B. Sartor Stijn Eyerman Lieven Eeckhout

Understanding and analyzing multi-threaded program performance scalability is far from trivial, which severely complicates parallel software development optimization. In this paper, we present bottle graphs, a powerful analysis tool that visualizes performance, in regards to both per-thread parallelism execution time. Each thread represented as box, with its height equal the share of total time, width parallelism, area running The boxes all threads are stacked upon each other, leading stack...

10.1145/2509136.2509529 article EN 2013-10-23

Efficient Design Space Exploration of High Performance Embedded Out-of-Order Processors

OPENALEX - Publications

Stijn Eyerman Lieven Eeckhout Koen De Bosschere

Previous work on efficient customized processor design primarily focused in-order architectures. However, with the recent introduction of out-of-order processors for high-end high-performance embedded applications, researchers and designers need to address how automate process processors. Because parallel execution independent instructions in processors, methodologies which subdivide search space components are unlikely be effective terms accuracy designing In this paper we propose evaluate...

10.1109/date.2006.243735 article EN 2006-01-01

Characterizing the branch misprediction penalty

OPENALEX - Publications

Stijn Eyerman James E. Smith Lieven Eeckhout

Despite years of study, branch mispredictions remain as a significant performance impediment in pipelined superscalar processors. In general, the misprediction penalty can be substantially larger than frontend pipeline length (which is often equated with penalty). We identify and quantify five contributors to penalty: (i) length, (ii) number instructions since last miss event (branch misprediction, I-cache miss, long D-cache miss)-this related burstiness events, (iii) inherent ILP program,...

10.1109/ispass.2006.1620789 article EN 2006-04-28

Mechanistic-empirical processor performance modeling for constructing CPI stacks on real hardware

OPENALEX - Publications

Stijn Eyerman Kenneth Hoste Lieven Eeckhout

Analytical processor performance modeling has received increased interest over the past few years. There are basically two approaches to constructing an analytical model: mechanistic and empirical modeling. Mechanistic builds up model starting from a basic understanding of underlying system - white-box approach whereas constructs through statistical inference machine learning training data, e.g., regression or neural networks black-box approach. While is typically easier construct, it...

10.1109/ispass.2011.5762738 article EN 2011-04-01

Restating the Case for Weighted-IPC Metrics to Evaluate Multiprogram Workload Performance

OPENALEX - Publications

Stijn Eyerman Lieven Eeckhout

Weighted speedup is nowadays the most commonly used multiprogram workload performance metric. a weighted-IPC metric, i.e., IPC of each program first weighted with its isolated IPC. Recently, Michaud questions validity metrics by arguing that they are inconsistent and favors unfairness [4]. Instead, he advocates using arithmetic or harmonic mean raw values programs in workload. We show not inconsistent, fair giving equal importance to program. argue that, contrast raw-IPC metrics, have...

10.1109/l-ca.2013.9 article EN IEEE Computer Architecture Letters 2013-05-06

Micro-architecture independent analytical processor performance and power modeling

OPENALEX - Publications

Sam Van den Steen Sander De Pestel Moncef Mechri Stijn Eyerman Trevor E. Carlson and 3 more

Optimizing processors for specific application(s) can substantially improve energy-efficiency. With the end of Dennard scaling, and corresponding reduction in energyefficiency gains from technology such approaches may become increasingly important. However, designing applicationspecific require fast design space exploration tools to optimize targeted application(s). Analytical models be a good fit as they provide performance estimations insight into interaction between an application's...

10.1109/ispass.2015.7095782 article EN 2015-03-01

Modeling critical sections in Amdahl's law and its implications for multicore design

OPENALEX - Publications

Stijn Eyerman Lieven Eeckhout

This paper presents a fundamental law for parallel performance: it shows that performance is not only limited by sequential code (as suggested Amdahl's law) but also fundamentally synchronization through critical sections. Extending software model to include sections, we derive the surprising result impact of sections on can be modeled as completely part and part. The determined probability entering section contention (i.e., multiple threads wanting enter same section). reveals at least...

10.1145/1816038.1816011 article EN ACM SIGARCH Computer Architecture News 2010-06-19

Near-side prefetch throttling

OPENALEX - Publications

Wim Heirman Kristof Du Bois Yves Vandriessche Stijn Eyerman Ibrahim Hur

In modern processors, prefetching is an essential component for hiding long-latency memory accesses. However, too aggressively can easily degrade performance by evicting useful data from cache, or saturating precious bandwidth. Tuning the prefetcher's activity thus important problem. Existing techniques tend to focus on detecting negative symptoms of aggressive prefetching, such as unused prefetches being evicted bandwidth saturation, and throttle prefetcher in response.

10.1145/3243176.3243181 article EN 2018-10-10

A first-order mechanistic model for architectural vulnerability factor

OPENALEX - Publications

Arun Arvind Nair Stijn Eyerman Lieven Eeckhout Lizy K. John

Soft error reliability has become a first-order design criterion for modern microprocessors. Architectural Vulnerability Factor (AVF) modeling is often used to capture the probability that radiation-induced fault in hardware structure will manifest as an at program output. AVF estimation requires detailed microarchitectural simulations which are time-consuming and typically present aggregate metrics. Moreover, it large number of derive insight into impact events on AVF. In this work we...

10.1109/isca.2012.6237024 article EN 2012-06-01

Per-thread cycle accounting in multicore processors

OPENALEX - Publications

Kristof Du Bois Stijn Eyerman Lieven Eeckhout

While multicore processors improve overall chip throughput and hardware utilization, resource sharing among the cores leads to unpredictable performance for individual threads running on a processor. Unpredictable per-thread becomes problem when considered in context of scheduling: system software assumes that all make equal progress, however, this is not what provides. This may lead problems at level such as missed deadlines, reduced quality-of-service, non-satisfied service-level...

10.1145/2400682.2400688 article EN ACM Transactions on Architecture and Code Optimization 2013-01-01