André Seznec

ORCID: 0000-0002-3058-6503
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Interconnection Networks and Systems
  • Embedded Systems Design Techniques
  • Distributed and Parallel Computing Systems
  • Low-power high-performance VLSI design
  • Algorithms and Data Compression
  • Cloud Computing and Resource Management
  • Advanced Memory and Neural Computing
  • Ferroelectric and Negative Capacitance Devices
  • Network Packet Processing and Optimization
  • Radiation Effects in Electronics
  • Distributed systems and fault tolerance
  • Numerical Methods and Algorithms
  • Real-Time Systems Scheduling
  • Chaos-based Image/Signal Encryption
  • Natural Language Processing Techniques
  • Cryptographic Implementations and Security
  • Caching and Content Delivery
  • Neural Networks and Applications
  • Semiconductor materials and devices
  • VLSI and Analog Circuit Testing
  • Computability, Logic, AI Algorithms
  • Energy Efficient Wireless Sensor Networks
  • Manufacturing Process and Optimization

Inria Rennes - Bretagne Atlantique Research Centre
2009-2024

Intel (United States)
2024

Institut de Recherche en Informatique et Systèmes Aléatoires
2011-2022

Université de Rennes
1992-2022

Centre National de la Recherche Scientifique
2020-2022

Institut national de recherche en informatique et en automatique
2011-2020

Central Compilation & Translation Bureau
2016-2020

Ghent University Hospital
2012

Pennsylvania State University
2012

Universitat Politècnica de Catalunya
2003

Article Free Access Share on A case for two-way skewed-associative caches Author: André Seznec View Profile Authors Info & Claims ISCA '93: Proceedings of the 20th annual international symposium computer architectureJune 1993 Pages 169–178https://doi.org/10.1145/165123.165152Published:01 May 1993Publication History 189citation3,924DownloadsMetricsTotal Citations189Total Downloads3,924Last 12 Months505Last 6 weeks58 Get Citation AlertsNew Alert added!This alert has been successfully added and...

10.1145/165123.165152 article EN 1993-01-01

As modern microprocessors employ deeper pipelines and issue multiple instructions per cycle, they are becoming increasingly dependent on accurate branch prediction. Because hardware resources for branch-predictor tables invariably limited, it is not possible to hold all relevant history active branches at the same time, especially large workloads consisting of processes operating-system code. The problem that results, commonly referred as aliasing in tables, many ways similar misses occur...

10.1145/264107.264211 article EN 1997-05-01

This paper presents the Alpha EV8 conditional branch predictor The microprocessor project, canceled in June 2001 a late phase of development, envisioned an aggressive 8-wide issue out-of-order superscalar microarchitecture featuring very deep pipeline and simultaneous multithreading. Performance such processor is highly dependent on accuracy its consequently large silicon area was devoted to prediction EV8. relies global history features total 352 Kbits.The focus this different trade-offs...

10.1145/545214.545249 article EN ACM SIGARCH Computer Architecture News 2002-05-01

It has been observed that some applications manipulate large amounts of null data. Moreover these zero data often exhibit high spatial locality. On more than 20% the accesses concern blocks. Representing a block in cache on standard line appears as waste resources.

10.1145/1542275.1542288 article EN 2009-06-08

The TAGE predictor is often considered as state-of-the-art in conditional branch predictors proposed by academy. In this paper, we first present directions to reduce the hardware implementation cost of TAGE. Second show how further misprediction rate through augmenting it with small side predictors.

10.1145/2155620.2155635 preprint EN 2011-12-03

Phase Change Memory (PCM) may become a viable alternative for the design of main memory systems in next few years. However PCM suffers from limited write endurance. Therefore future adoption as technology will depend on availability practical solutions wear leveling that avoids uneven usage especially presence potentially malicious users. First generation algorithms were designed typical workloads and have significantly reduced lifetime under access patterns try to same line continuously....

10.1109/hpca.2011.5749753 article EN 2011-02-01

In this paper, we present an approach to estimate GPU applications' performance upper bound based on algorithm analysis and assembly code level benchmarking. As example, analyze the potential peak of SGEMM (Single-precision General Matrix Multiply) Fermi (GF110) Kepler (GK104) GPUs. We try answer question how much optimization space is left for why. According our analysis, nature (Kepler) instruction set limited issue throughput schedulers are main limitation factors theoretical performance....

10.1109/cgo.2013.6494986 preprint EN 2013-02-01

This work demonstrates that a set of commercial and scale-out applications exhibit significant use superpages thus suffer from the fixed small superpage TLB structures some modern core designs. Other processors better cope with at expense using power-hungry slow fully-associative TLBs. We consider alternate designs allow all pages to freely share single, power-efficient fast set-associative TLB. propose prediction-guided multi-grain design uses prediction mechanism avoid multiple lookups in...

10.1109/hpca.2015.7056034 preprint EN 2015-02-01

The performance of out-of-order processors increases with the instruction window size, In conventional processors, effective cannot be larger than issue buffer. Determining which instructions from buffer can launched to execution units is a time-critical operation complexity size. We propose relieve stage by reordering before they enter This study introduces general principle data flow prescheduling. Then we describe possible implementation. Our preliminary results show that data-flow...

10.1109/hpca.2001.903249 article EN 2002-11-13

A basic rule in computer architecture is that a processor cannot execute an application faster than it fetches its instructions. This paper presents novel cost-effective mechanism called the two-block ahead branch predictor. Information from current instruction block not used for predicting address of next block, but rather following block.This approach overcomes fetch bottle-neck exhibited by wide-dispatch "brainiac" processors enabling them to efficiently predict addresses two blocks...

10.1145/237090.237169 article EN 1996-09-01

High performance multi-core processors are becoming an industry reality. Although multi-cores suited for multithreaded and multi-programmed workloads, many applications still mono-thread with a single thread workload is important issue. Furthermore, recent studies suggest that performance, power temperature considerations of future may necessitate activity-migration between cores.Motivated by the above, this paper investigates implications migration on multi-core. Specifically, study...

10.1145/1105734.1105745 article EN ACM SIGARCH Computer Architecture News 2005-11-01

In this paper, we introduce and analyze the Optimized GEometric History Length (O-GEHL) branch Predictor that efficiently exploits very long global histories in the100-200 bits range. The GEHL predictor features several tables T(i) (e.g. 8) indexed through independent functions of history address. set used lengths forms a geometric series, i.e., L(j) = \alpha ^{i - 1} L(1).This allows to capture correlation on recent outcomes as well old branches. As perceptron predictors, prediction is...

10.1145/1080695.1070003 article EN ACM SIGARCH Computer Architecture News 2005-05-01

Cache compression seeks the benefits of a larger cache with area and power smaller cache. Ideally, compressed increases effective capacity by tightly compacting blocks, has low tag metadata overheads, allows fast lookups. Previous designs, however, fail to achieve all these goals. In this paper, we propose Skewed Compressed (SCC), new hardware that lowers overheads performance. SCC tracks super blocks reduce overhead, compacts into variable number sub-blocks internal fragmentation, but...

10.1109/micro.2014.41 article EN 2014-12-01

Multi-threaded processors execute multiple threads concurrently in order to increase overall throughput. It is well documented that multi-threading affects per-thread performance but, more importantly, some are affected than others. This especially troublesome for multi-programmed workloads. Fairness metrics measure whether all equally. However defining equal treatment not straightforward. Several fairness multi-threaded have been utilized the literature, although there does seem be a...

10.1109/l-ca.2011.1 article EN IEEE Computer Architecture Letters 2011-01-01

Dedicating more silicon area to single thread performance will necessarily be considered as worthwhile in future - potentially heterogeneous multicores. In particular, Value prediction (VP) was proposed the mid 90's enhance of high-end uniprocessors by breaking true data dependencies. this paper, we reconsider concept Prediction contemporary context and show its potential a direction improve current performance. First, building on top research carried out during previous decade confidence...

10.1109/hpca.2014.6835952 preprint EN 2014-02-01

Poor data layout in memory may generate weak locality and poor performance. Code transformations such as loop blocking or interchanging array padding have addressed this issue for scientific applications. However many generalist applications do not use arrays, but dynamically allocated heterogeneous structures. In paper, we explore two techniques structures: field reorganization, instance interleaving. The application of these be guided by program profiling. This allows significant cache...

10.5555/522344.825680 article EN International Conference on Parallel Architectures and Compilation Techniques 1998-10-12

Tarantula is an aggressive floating point machine targeted at technical, scientific and bioinformatics workloads, originally planned as a follow-on candidate to the EV8 processor [6, 5]. adds core vector unit capable of 32 double-precision flops per cycle. The fetches data directly from 16 MByte second level cache with peak bandwidth sixty four 64-bit values whole chip backed by memory controller delivering over 64 GBytes/s raw band- width. extends Alpha ISA new instructions that operate on...

10.1145/545214.545247 article EN ACM SIGARCH Computer Architecture News 2002-05-01

Sectored caches have been used for many years in order to reconcile low tag array size and small or medium block size. In a sectored cache, single address is associated with sector consisting on several cache lines, while validity, dirty coherency tags are each of the inner lines.Maintaining major issue designs (e.g. L2 caches). Using design trade-off between which possible large line memory traffic requires size.This technique has including on-chip microprocessor external second level...

10.1145/191995.192072 article EN International Symposium on Computer Architecture 1994-04-01

This paper presents the Alpha EV8 conditional branch predictor. The microprocessor project, canceled in June 2001 a late phase of development, envisioned an aggressive 8-wide issue out-of-order superscalar microarchitecture featuring very deep pipeline and simultaneous multithreading. Performance such processor is highly dependent on accuracy its predictor consequently large silicon area was devoted to prediction EV8. relies global history features total 352 Kbits. focus this different...

10.1109/isca.2002.1003587 article EN 2003-06-25

Phase change memory (PCM) technology appears as more scalable than DRAM technology. As PCM exhibits access time slightly longer but in the same range DRAMs, several recent studies have proposed to use PCMs for designing main systems. Unfortunately suffers from a limited write endurance; typically each cell can be only written large still number of times (10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">7</sup> 10...

10.1109/l-ca.2010.2 article EN IEEE Computer Architecture Letters 2010-01-01

Sectored caches have been used for many years in order to reconcile low tag array size and small or medium block size. In a sectored cache, single address is associated with sector consisting on several cache lines, while validity, dirty coherency tags are each of the inner lines. Usually line location statically linked one only word location. decoupled introduced paper, this monolithic association broken; dynamically chosen at fetch time among possible locations. The volume same range as...

10.1109/isca.1994.288133 article EN 2002-12-17

In this paper, we introduce and analyze the Optimized GEometric History Length (O-GEHL) branch Predictor that efficiently exploits very long global histories in 100-200 bits range. The GEHL predictor features several tables T(i) (e.g. 8) indexed through independent functions of history address. set used lengths forms a geometric series, i.e., L(j) = /spl alpha//sup j-1/L(1). This allows to capture correlation on recent outcomes as well old branches. As perceptron predictors, prediction is...

10.1109/isca.2005.13 article EN 2005-07-28
Coming Soon ...