Jesús Alastruey-Benedé

ORCID: 0000-0003-4164-5078
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Interconnection Networks and Systems
  • Advanced Data Storage Technologies
  • Low-power high-performance VLSI design
  • Genomics and Phylogenetic Studies
  • Algorithms and Data Compression
  • Distributed and Parallel Computing Systems
  • Embedded Systems Design Techniques
  • Cloud Computing and Resource Management
  • Distributed systems and fault tolerance
  • Advanced Memory and Neural Computing
  • Enzyme Structure and Function
  • Protein Structure and Dynamics
  • Radiation Effects in Electronics
  • Caching and Content Delivery
  • CCD and CMOS Imaging Sensors
  • Evolutionary Algorithms and Applications
  • Chromosomal and Genetic Variations
  • Quantum Mechanics and Applications
  • Advanced Optimization Algorithms Research
  • Innovations in Educational Methods
  • RNA and protein synthesis mechanisms
  • DNA and Biological Computing
  • Educational Technology in Learning
  • Real-Time Systems Scheduling

Universidad de Zaragoza
2014-2025

Instituto de Investigación Sanitaria Aragón
2020

Hispanics in Philanthropy
2015

Data prefetching is a technique that plays crucial role in modern high-performance processors by hiding long latency memory accesses. Several state-of-the-art hardware prefetchers exploit the concept of deltas, defined as difference between cache line addresses two demand Existing delta prefetchers, such best offset (BOP) and multi-lookahead (MLOP), train predict future accesses based on global deltas. We observed use deltas results missed opportunities to anticipate accesses.In this paper,...

10.1109/micro56248.2022.00072 article EN 2022-10-01

RISC-V is an emergent architecture that gaining strength in low-power IoT applications. The stabilization of the architectural extensions and start commercialization based SOCs, like Kendryte K210, raises question whether this open standard will facilitate development applications specific markets or not. In paper we evaluate environments, toolchain, debugging processes related to Sipeed MAIX Go board, as well standalone SDK Micropython port for K210. training pipeline built-in convolutional...

10.1109/dcis51330.2020.9268645 article EN 2020-11-18

Arm usage has substantially grown in the High-Performance Computing (HPC) community. Japanese supercomputer Fugaku, powered by Arm-based A64FX processors, held top position on Top500 list between June 2020 and 2022, currently sitting fourth position. The recently released 7th generation of Amazon EC2 instances for compute-intensive workloads (C7 g) is also Graviton3 processors. Projects like European Mont-Blanc U.S. DOE/NNSA Astra are further examples irruption HPC. In parallel, over last...

10.1016/j.future.2024.03.050 article EN cc-by-nc Future Generation Computer Systems 2024-04-02

Scaling supply voltage to values near the threshold allows a dramatic decrease in power consumption of processors; however, lower voltage, higher sensitivity process variation, and, hence, reliability. Large SRAM structures, like last-level cache (LLC), are extremely vulnerable variation because they aggressively sized satisfy high density requirements. In this paper, we propose Concertina, an LLC designed enable reliable operation at low voltages with conventional cells. Based on...

10.1109/tc.2015.2479585 article EN IEEE Transactions on Computers 2015-09-18

SPEC CPU is one of the most common benchmark suites used in computer architecture research. CPU2017 has recently been released to replace CPU2006. In this paper we present a detailed evaluation memory hierarchy performance for both CPU2006 and single-threaded benchmarks. The experiments were executed on an Intel Xeon Skylake-SP, which first processor implement mostly non-inclusive last-level cache (LLC). We classification benchmarks according their pressure analyze impact different LLC...

10.1371/journal.pone.0220135 article EN cc-by PLoS ONE 2019-08-01

Sequence alignment pipelines for human genomes are an emerging workload that will dominate in the precision medicine field. BWA-MEM2 is a tool widely used scientific community to perform read mapping studies. In this paper, we port AArch64 architecture using ARMv8-A specification, and compare resulting version against Intel Skylake system both performance energy-to-solution. The porting effort entails numerous code modifications, since implements certain kernels x86_64 specific intrinsics,...

10.1109/tcbb.2023.3264514 article EN IEEE/ACM Transactions on Computational Biology and Bioinformatics 2023-04-05

FM-index is a compact data structure suitable for fast matches of short reads to large reference genomes. The matching algorithm using this index exhibits irregular memory access patterns that cause frequent cache misses, resulting in bound problem. This paper analyzes different versions presented the literature, focusing on those computing aspects related access. As result analysis, we propose new organization minimizes demand bandwidth, allowing great improvement performance processors...

10.1109/tcbb.2018.2884701 article EN IEEE/ACM Transactions on Computational Biology and Bioinformatics 2018-12-07

Computer assisted sperm analysis (CASA) systems can reduce errors occurring in manual analysis. However, commercial CASA are frequently not applicable at the forefront of challenging research endeavors. The development open source software may offer important solutions for researchers working related areas. Here, we present an example this, with three new modules OpenCASA (hosted Github). first is Chemotactic Sperm Accumulation Module, a powerful tool studying chemotactic behavior, analyzing...

10.3390/biology9080207 article EN cc-by Biology 2020-08-05

The performance impact of the Physical Register File(PRF) size on Simultaneous Multithreading processors has not been extensively studied in spite being a critical shared resource. In this paper we analyze effect PRF for broad set resource allocation policies (Icount, Stall, Flush, Flush++, Static,Dcra and Hill-climbing) evaluate them under two metrics: instructions per second (IPS) throughput harmonic mean weighted IPCs (Hmean-wIPC) fairness. We have found that policy should be considered...

10.1109/sbac-pad.2008.17 article EN 2008-10-01

The FM-index is a data structure used in genomics for exact search of input sequences over large reference genomes. Algorithms based on the show an irregular memory access pattern, resulting bound problem. We analyze recent implementation and highlight existing throughput-memory trade-offs, showing that requirements limit k-steps. propose COFI, COmpressed FM-Index K-steps. COFI enables 15-step using less than 16 GB human genome 3 giga base pairs. An algorithm this new layout evaluated both...

10.1109/tcbb.2020.3000253 article EN IEEE/ACM Transactions on Computational Biology and Bioinformatics 2020-06-05

In molecular dynamics simulations we can often increase the time step by imposing constraints on bond lengths and angles. This allows us to extend length of interval therefore range physical phenomena that afford simulate. We examine existing algorithms software for solving nonlinear constraint equations in parallel explain why it is necessary advance state-of-the-art. present ILVES-PC, a new algorithm proteins accurately efficiently. It solves same system differential algebraic as...

10.1016/j.cpc.2023.108742 article EN cc-by-nc-nd Computer Physics Communications 2023-03-29

This paper proposes and evaluates a new microarchitecture for out-of-order processors that supports speculative renaming. We call renaming to the omission of physical register allocation along with early release registers. These policies may cause operand not be kept in file (PRF). Thus, we add low-ported auxiliary (XRF) located outside processor core keeps values absent PRF supplies them at higher latency. To support location operands being either or XRF, use virtual consider directed by...

10.1109/ipdps.2007.370237 article EN 2007-01-01

Power density has become the limiting factor in technology scaling as power budget restricts amount of hardware that can be active at same time. Reducing supply voltage to ultra-low ranges close threshold region promise great energy savings. However, potential savings are limited by correct operation SRAM cells, which is not guaranteed below Vddmin, minimum cache structures operate reliably. Understanding effects operating Vddmin requires complex modelling, so we introduce an updated...

10.1109/sbac-pad.2014.12 article EN 2014-10-01

Abstract The management of shared resources in multicore processors is an open problem due to the continuous evolution these systems. trend toward increasing number cores and organizing them clusters sets out new challenges not considered previous works. In this paper, we characterize use cache memory bandwidth AMD Rome processor executing multiprogrammed workloads propose several mechanisms that control improve system performance fairness. Our require no hardware or operating modifications....

10.1007/s11227-023-05070-0 article EN cc-by The Journal of Supercomputing 2023-02-04

Arm® usage has substantially grown in the High-Performance Computing (HPC) community. Japanese supercomputer Fugaku, powered by Arm®-based A64FX processors, held top position on Top500 list between June 2020 and 2022, currently sitting second position. The recently released 7th generation of Amazon EC2 instances for compute-intensive workloads (C7g) is also Graviton3 processors. Projects like European Mont-Blanc U.S. DOE/NNSA Astra are further examples irruption HPC. In parallel, over last...

10.2139/ssrn.4632220 preprint EN 2023-01-01

The late release policy of conventional renaming keeps many registers in the register file assigned spite containing values that will never be read future. In this work, we study potential a novel scheme speculatively releases physical as soon it has been by predicted last instruction references its value. An auxiliary placed outside critical paths processor pipeline holds early released just case they are unexpectedly referenced some instruction. addition to demonstrate feasibility last-use...

10.1145/1128022.1128061 article EN 2006-05-03

Do the demands of new software outpace developments in hardware? Experiments with behavior SPEC CPU on-chip caches and data collection from a wide range processors over time address this question illuminate trends hardware evolution

10.1109/mm.2006.80 article EN IEEE Micro 2006-07-01

This paper makes the case for a single-ISA heterogeneous computing platform, AISC, where each compute engine (be it core or an accelerator) supports different subset of very same ISA. An ISA may not be functionally complete, but union (per engine) subsets renders platform-wide single Tailoring microarchitecture to that can easily reduce hardware complexity. At time, energy efficiency improve by exploiting algorithmic noise tolerance: mapping code sequences tolerate (any potential inaccuracy...

10.48550/arxiv.1803.06955 preprint EN other-oa arXiv (Cornell University) 2018-01-01

For students of any Computer Engineering program, attaining an integrated vision the different abstraction levels is paramount to fully understand and exploit a computer system, especially when tough topics such as parallelism, concurrency, consistency, or atomicity are involved at hardware-software frontiers. However, structure typical engineering programs leads creation self-contained courses, where single level studied overall picture lost.

10.1145/3338698.3338886 article EN 2019-06-22
Coming Soon ...