Daniele De Sensi

ORCID: 0000-0002-7244-639X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Interconnection Networks and Systems
  • Distributed and Parallel Computing Systems
  • Cloud Computing and Resource Management
  • Advanced Data Storage Technologies
  • Software-Defined Networks and 5G
  • Embedded Systems Design Techniques
  • Distributed systems and fault tolerance
  • Complex Network Analysis Techniques
  • Real-Time Systems Scheduling
  • Caching and Content Delivery
  • Advanced Memory and Neural Computing
  • Petri Nets in System Modeling
  • Scientific Computing and Data Management
  • 3D IC and TSV technologies
  • Software System Performance and Reliability
  • Graph Theory and Algorithms
  • Data Visualization and Analytics
  • Network Security and Intrusion Detection
  • Thin-Film Transistor Technologies
  • Security and Verification in Computing
  • Advanced Graph Neural Networks
  • Low-power high-performance VLSI design
  • Advanced Database Systems and Queries
  • Data Stream Mining Techniques

Sapienza University of Rome
2023-2024

ETH Zurich
2019-2024

Zürcher Fachhochschule
2022

University of Pisa
2012-2021

Virgo
2018

Laboratoire d'Informatique de Paris-Nord
2016

The interconnect is one of the most critical components in large scale computing systems, and its impact on performance applications going to increase with system size. In this paper, we will describe SLINGSHOT, an interconnection network for systems. SLINGSHOT based high-radix switches, which allow building exascale hyper-scale datacenters networks at three switch-to-switch hops. Moreover, provides efficient adaptive routing congestion control algorithms, highly tunable traffic classes....

10.1109/sc41405.2020.00039 preprint EN 2020-11-01

In current computing systems, many applications require guarantees on their maximum power consumption to not exceed the available budget. On other hand, for some applications, it could be possible decrease performance, yet maintain an acceptable level, in order reduce consumption. To provide such guarantees, a solution consists changing number of cores assigned application, clock frequency, and placement application threads over cores. However, performance have different trends depending...

10.1145/3004054 article EN ACM Transactions on Architecture and Code Optimization 2016-12-02

This paper studies k-plexes, a well known pseudo-clique model for network communities. In k-plex, each node can miss at most k-1 links. Our goal is to detect large communities in today's real-world graphs which have hundreds of millions edges. While many tried, this task has been elusive so far due its computationally challenging nature: k-plexes and other pseudo-cliques are harder find more numerous than cliques, hard problem. We present D2K, the first algorithm able very just few minutes....

10.1145/3219819.3220093 article EN 2018-07-19

System noise can negatively impact the performance of HPC systems, and interconnection network is one main factors contributing to this problem. To mitigate effect, adaptive routing sends packets on non-minimal paths if they are less congested. However, while may interference caused by congestion, it also generates more traffic since traverse additional hops, causing in turn congestion other applications application itself. In paper, we first describe how estimate noise. By following these...

10.1145/3295500.3356196 preprint EN 2019-11-07

Current architectures provide many control knobs for the reduction of power consumption applications, like reducing number used cores or scaling down their frequency. However, choosing right values these in order to satisfy requirements on performance and/or is a complex task and trying all possible combinations an unfeasible solution since it would require too much time. For this reasons, there need techniques that allow accurate estimation application when specific configuration used....

10.1109/pdp.2016.41 article EN 2016-02-01

The allreduce operation is one of the most commonly used communication routines in distributed applications. To improve its bandwidth and to reduce network traffic, this can be accelerated by offloading it switches, that aggregate data received from hosts, send them back aggregated result. However, existing solutions provide limited customization opportunities might suboptimal performance when dealing with custom operators types, sparse data, or reproducibility aggregation a concern. deal...

10.1145/3458817.3476178 preprint EN 2021-10-21

High-level parallel programming is an active research topic aimed at promoting methodologies that provide the programmer with high-level abstractions to develop complex software reduced time solution. Pattern-based based on a set of composable and customizable patterns used as basic building blocks in applications. In recent years, considerable effort has been made empowering this model features able overcome shortcomings early approaches concerning flexibility performance. article, we...

10.1145/3132710 article EN ACM Transactions on Architecture and Code Optimization 2017-10-24

Managing low-level architectural features for controlling performance and power consumption is a growing demand in the parallel computing community. Such include, but are not limited to: energy profiling, platform topology analysis, CPU cores disabling frequency scaling. However, these mechanisms usually managed by specific tools, without any interaction between each other, thus hampering their usability. More important, most existing tools can only be used through command line interface...

10.1016/j.softx.2017.06.005 article EN cc-by SoftwareX 2017-01-01

Today's stream processing systems handle high-volume data streams in an efficient manner. To achieve this goal, they are designed to scale out on large clusters of commodity machines. However, despite the use distributed architectures, lack support co-processors like graphical units (GPUs) ready accelerate data-parallel tasks. The main reason for integration is that GPU and streaming paradigm have different models, with GPUs needing a bulk present at once while advocates tuple-at-a-time...

10.1109/access.2019.2910312 article EN cc-by-nc-nd IEEE Access 2019-01-01

This work proposes a methodology to find performance and energy trade-offs for parallel applications running on Heterogeneous Multi-Processing systems with single instruction-set architecture. These offer flexibility in the form of different core types voltage frequency pairings, defining vast design space explore. Therefore, given application, choosing configuration that optimizes consumption is not straightforward. Our method novel analytical models power whose parameters can be fitted...

10.3390/en13092409 article EN cc-by Energies 2020-05-11

Cloud computing represents an appealing opportunity for cost-effective deployment of HPC workloads on the best-fitting hardware. However, although cloud and on-premise systems offer similar computational resources, their network architecture performance may differ significantly. For example, these use fundamentally different transport routing protocols, which introduce noise that can eventually limit application scaling. This work analyzes performance, scalability, cost running systems....

10.1145/3570609 article EN Proceedings of the ACM on Measurement and Analysis of Computing Systems 2022-12-01

The notion of k-truss has been introduced a decade ago in social network analysis and security for community detection, as form cohesive subgraphs less stringent than clique (set pairwise linked nodes), more selective k-core (induced subgraph with minimum degree k). A is an inclusion-maximal H which each edge belongs to at least k - 2 triangles inside H. truss decomposition establishes, e, the maximum e k-truss. Analogously largest k-core, strongest max-truss, corresponds having k. Even...

10.1109/access.2020.3011667 article EN cc-by IEEE Access 2020-01-01

Determining the right amount of resources needed for a given computation is critical problem. In many cases, computing systems are configured to use an manage high load peaks even though this cause energy waste when not fully utilised. To avoid problem, adaptive approaches used dynamically increase/decrease computational depending on real needs. A different approach based Dynamic Voltage and Frequency Scaling (DVFS) emerging as possible alternative solution reduce consumption idle CPUs by...

10.1109/pdp.2015.92 article EN 2015-03-01

High-level parallel programming is a de-facto standard approach to develop software with reduced time development. abstractions are provided by existing frameworks as pragma-based annotations in the source code, or through pre-built patterns that recur frequently algorithms, and can be easily instantiated programmer add structure development of software. In this paper we focus on second propose P3ARSEC, benchmark suite for pattern-based consisting representative subset PARSEC applications....

10.1145/3019612.3019745 article EN 2017-04-03

This paper presents a security analysis of the InfiniBand architecture, prevalent RDMA standard, and NVMe-over-Fabrics (NVMe-oF), prominent protocol for industrial disaggregated storage that exploits protocols to achieve low-latency high-bandwidth access remote solid-state devices. Our work, NeVerMore, discovers new vulnerabilities in unveils several attack vectors on RDMA-enabled applications NVMe-oF protocol, showing current mechanisms do not address posed by use RDMA. In particular, we...

10.1145/3548606.3560568 article EN Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security 2022-11-07

Numerous microarchitectural optimizations unlocked tremendous processing power for deep neural networks that in turn fueled the AI revolution. With exhaustion of such optimizations, growth modern is now gated by performance training systems, especially their data movement. Instead focusing on single accelerators, we investigate data-movement characteristics large-scale at full system scale. Based our workload analysis, design HammingMesh, a novel network topology provides high bandwidth low...

10.1109/sc41404.2022.00016 article EN 2022-11-01

The dataflow programming model has been extensively used as an effective solution to implement efficient parallel frameworks. However, the amount of resources allocated runtime support is usually fixed once by programmer or runtime, and kept static during entire execution. While there are cases where such a choice may be appropriate, other scenarios require dynamically change parallelism degree application In this paper we propose algorithm for multicore shared memory platforms, that selects...

10.1142/s0129626417400047 article EN Parallel Processing Letters 2017-03-01

Summary Continuous streaming computations are usually composed of different modules, exchanging data through shared message queues. The selection the algorithm used to access such queues (ie, concurrency control ) is a critical aspect both for performance and power consumption. In this paper, we describe design automatic implementing power‐efficient communications on shared‐memory multicores. automatically switches between nonblocking blocking protocols, getting best from two worlds, ie,...

10.1002/cpe.4652 article EN Concurrency and Computation Practice and Experience 2018-08-14

A k-truss is a subgraph where every edge belongs to at least k-2 triangles in the subgraph. The truss decomposition assigns each maximum k for which k-truss, and trussness of graph among its edges. Discovery algorithms k-trusses provide useful insight analytics (such as community detection). Even though they take polynomial time, on massive networks suffer from handling potentially cubic number wedges: either need long time recompute several times, have high memory usage, or rely large cores...

10.1109/hpec.2018.8547735 article EN 2018-09-01
Coming Soon ...