- Interconnection Networks and Systems
- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Supercapacitor Materials and Fabrication
- Cloud Computing and Resource Management
- Embedded Systems Design Techniques
- Distributed systems and fault tolerance
- Distributed and Parallel Computing Systems
- Low-power high-performance VLSI design
- Advanced Optical Network Technologies
- Advanced Memory and Neural Computing
- Graphene research and applications
- Advanced Malware Detection Techniques
- Software System Performance and Reliability
- Advancements in Battery Materials
- Optical Network Technologies
- Photonic and Optical Devices
- Software-Defined Networks and 5G
- Network Traffic and Congestion Control
- VLSI and FPGA Design Techniques
- E-Learning and Knowledge Management
- Software Testing and Debugging Techniques
- Green IT and Sustainability
- Software Engineering Research
- IoT and Edge/Fog Computing
Universitat Politècnica de València
2015-2024
Universitat Politècnica de Catalunya
2023-2024
Laboratoires Spiral (France)
2015-2017
Saarland University
2017
Université de Lille
2015-2017
University of Castilla-La Mancha
2015
Laboratoire d'Informatique Fondamentale de Lille
2015
National University of San Juan
2009-2010
Universitat de València
2010
Clusters of PCs have become very popular to build high performance computers. These machines use commodity linked by a speed interconnect. Routing is one the most important design issues interconnection networks. Adaptive routing usually better balances network traffic, thus allowing obtain higher throughput. However, adaptive introduces out-of-order packet delivery, which unacceptable for some applications. Concerning topology, commercially available interconnects are based on fat-tree....
To meet the demand for more powerful high-performance shared-memory servers, multiprocessor systems must incorporate efficient and scalable cache coherence protocols, such as those based on directory caches. However, limited size of increasingly larger may cause frequent evictions entries and, consequently, invalidations cached blocks, which severely degrades system performance.
Massively parallel computing systems are being built with thousands of nodes. The interconnection network plays a key role for the performance such systems. However, high number components significantly increases probability failure. Additionally, failures in may isolate large fraction machine. It is therefore critical to provide an efficient fault-tolerant mechanism keep system running, even presence faults. This paper presents new routing methodology that does not degrade absence faults...
While the number of mobile apps published by app stores keeps on increasing, quality these varies widely. Unfortunately, for many apps, end-users continue experiencing bugs and crashes once installed their device. Crashes are annoying end-users, but they definitely developers who need to reproduce as fast possible before finding root cause reported issues. Given heterogeneity in hardware, platform releases, types users, reproduction step currently is one major challenges developers. This...
Achieving system fairness is a major design concern in current multicore processors. Unfairness arises due to contention the shared resources of system, such as LLC and main memory. To address this problem, many research works have proposed novel cache partitioning policies aimed at addressing without harming performance. Unfortunately, existing proposals targeting require extra hardware which makes them impractical commercial processors.Recent Intel Xeon processors feature Cache Allocation...
The popularity of smartphones is leading to an ever growing number mobile apps that are published in official app stores. However, users might experience bugs and crashes for some these apps. In this paper, we perform empirical study the Google Play Store automatically mine such error-suspicious We use knowledge inferred from analysis build a recommender system buggy checkers. More specifically, analyze permissions user reviews 46, 644 identify potential correlations between error-sensitive...
This paper proposes a built-in self-test/self-diagnosis procedure at start-up of an on-chip network (NoC). Concurrent BIST operations are carried out after reset each switch, thus resulting in scalable test application time with size. The key principle consists exploiting the inherent structural redundancy NoC architecture cooperative way, detecting faults pattern generators too. At-speed testing stuck-at can be performed less than 1200 cycles regardless their size, hardware overhead 11%.
Virtual channels are an appealing flow control technique for on-chip interconnection networks (NoCs), in that they can potentially avoid deadlock and improve link utilization network throughput. However, their use the resource constrained multi-processor system-on-chip (MPSoC) domain is still controversial, due to significant overhead terms of area, power cycle time degradation. This paper proposes a simple yet efficient approach VC implementation, which results more area- power-saving...
In this paper we present a methodology to design fault-tolerant routing algorithms for regular direct interconnection networks. It supports fully adaptive routing, does not degrade performance in the absence of faults, and reasonably large number faults without significantly degrading performance. The is mainly based on selection an intermediate node (if needed) each source-destination pair. Packets are adaptively routed and, at node, being ejected, they forwarded their destinations. order...
Most of past evaluations fat-trees for on-chip interconnection networks rely on oversimplifying or even irrealistic architecture and traffic pattern assumptions, very few layout analyses are available to relieve practical feasibility concerns in nanoscale technologies. This work aims at providing an in-depth assessment physical synthesis efficiency extrapolating silicon-aware performance figures back-annotate the system-level analysis. A 2D mesh is used as a reference comparison, 65 nm...
A key aspect in the design of efficient multiprocessor systems is cache coherence protocol. Although directory-based protocols constitute most scalable approach, limited size directory caches together with growing may cause frequent evictions and, consequently, invalidation cached blocks, which jeopardizes system performance. Directory keep track every memory block stored processor order to provide coherent access shared memory. However, a significant fraction blocks do not require...
The increasing popularity of cloud computing has forced providers to build economies scale meet the growing demand. Nowadays, data-centers include thousands physical machines, each hosting many virtual machines (VMs), which share main system resources, causing interference that can significantly impact on performance. Frequently, these run latency-critical workloads, whose performance is determined by tail latency, very sensitive co-running workloads. To prevent QoS violations, adopt...
The fat-tree is one of the most widely-used topologies by interconnection network manufacturers. Recently, a deterministic routing algorithm that optimally balances traffic in fat--trees was proposed. It can not only achieve almost same performance than adaptive routing, but also outperforms it for some patterns. Nevertheless, require high number switches with non-negligible wiring complexity. In this paper, we propose replacing fat--tree an unidirectional multistage referred to as Reduced...
Demonstrates that disk-level I/O requests are self-similar in nature. We show evidence (both visual and mathematical) accesses consistent with self-similarity. For this analysis, we have used two sets of disk activity traces collected from various systems over different periods time. In addition to studying the aggregated workload is directed storage system, perform a structural modeling order understand underlying causes produce observed This shows behavior can be explained by combining...
Networks-on-chip (NoCs) address the challenge to provide scalable communication bandwidth tiled architectures in a power-efficient fashion. The 2-D mesh is currently most popular regular topology used for on-chip networks tile-based architectures, because it perfectly matches silicon surface and easy implement. However, number of limitations have been proved open literature, especially long distance traffic. Two relevant variants meshes are explored this paper: high-dimensional concentrated...
Most of the data referenced by sequential and parallel applications running in current chip multiprocessors are only one thread can be considered as private data. A lot recent proposals leverage this observation to improve many aspects multiprocessors, such reducing coherence overhead or access latency distributed caches. The effectiveness those depend a large extent on amount detected However, mechanisms proposed so far do not consider migration use within different application phases. As...
Most of past evaluations fat-trees for on-chip interconnection networks rely on oversimplifying or even irrealistic architecture and traffic pattern assumptions, very few layout analyses are available to relieve practical feasibility concerns in nanoscale technologies. This work aims at providing an in-depth assessment physical synthesis efficiency extrapolating silicon-aware performance figures back-annotate the system-level analysis. A 2D mesh is used as a reference comparison, 65 nm...
Given the increasing competition in mobile-app ecosystems, improving user experience has become a major goal for app vendors. App Store 2.0 will exploit crowdsourced information about apps, devices, and users to increase overall quality of delivered mobile apps. generates different kinds actionable feedback from crowd information. This helps developers deal with potential errors that could affect their apps before publication or even when are users' hands. The vision been transformed into...
Most of the data referenced by sequential and parallel applications running in current chip multiprocessors are a single thread, i.e., private. Recent proposals leverage this observation to improve many aspects multiprocessors, such as reducing coherence overhead or access latency distributed caches. The effectiveness those depends large extent on amount detected private data. However, mechanisms proposed so far do not consider neither thread migration nor use within different application...
The reputation of a mobile app vendor is crucial to survive amongst the ever increasing competition. However this largely depends on quality apps, both functional and non-functional. One major non-functional requirement apps guarantee smooth UI interactions, since choppy scrolling or navigation caused by performance problems device's limited hardware resources, highly annoying for end-users. main research challenge automatically identifying devices that an varies depending its...
Recently, the notion of self-similarity has been applied to wide-area and local-area network traffic. This paper demonstrates that disk-level I/O requests are self-similar in nature. We show evidence, both visual mathematical, accesses consistent with self-similarity. Moreover, we this property is mainly due writes. For our experiments, use two sets traces collect disk activity from systems over a period months. Such behavior serious implications for performance evaluation storage subsystem...
Massively parallel computing systems have been or are being built with thousands of nodes. In such systems, high-performance interconnection networks crucial to achieve the maximum performance. Routing is one most important design issues networks. strategies can be mainly classified as source and distributed routing. Source routing has used in some because routers very simple. On other hand, allows more flexibility, but complex. Distributed implemented by a fixed hardware specific function...