NFDI4DS | UHH-SEMS - Publication Details

Daniele De Sensi

ORCID: 0000-0002-7244-639X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5056277459

Research Areas

Parallel Computing and Optimization Techniques
Interconnection Networks and Systems
Distributed and Parallel Computing Systems
Cloud Computing and Resource Management
Advanced Data Storage Technologies
Software-Defined Networks and 5G
Embedded Systems Design Techniques
Distributed systems and fault tolerance
Complex Network Analysis Techniques
Real-Time Systems Scheduling
Caching and Content Delivery
Advanced Memory and Neural Computing
Petri Nets in System Modeling
Scientific Computing and Data Management
3D IC and TSV technologies
Software System Performance and Reliability
Graph Theory and Algorithms
Data Visualization and Analytics
Network Security and Intrusion Detection
Thin-Film Transistor Technologies
Security and Verification in Computing
Advanced Graph Neural Networks
Low-power high-performance VLSI design
Advanced Database Systems and Queries
Data Stream Mining Techniques

Sapienza University of Rome
2023-2024

ETH Zurich
2019-2024

Zürcher Fachhochschule
2022

University of Pisa
2012-2021

Virgo
2018

Laboratoire d'Informatique de Paris-Nord
2016

An In-Depth Analysis of the Slingshot Interconnect

OPENALEX - Publications

Daniele De Sensi Salvatore Di Girolamo Kim H. McMahon Duncan Roweth Torsten Hoefler

The interconnect is one of the most critical components in large scale computing systems, and its impact on performance applications going to increase with system size. In this paper, we will describe SLINGSHOT, an interconnection network for systems. SLINGSHOT based high-radix switches, which allow building exascale hyper-scale datacenters networks at three switch-to-switch hops. Moreover, provides efficient adaptive routing congestion control algorithms, highly tunable traffic classes....

10.1109/sc41405.2020.00039 preprint EN 2020-11-01

A Reconfiguration Algorithm for Power-Aware Parallel Applications

OPENALEX - Publications

Daniele De Sensi Massimo Torquati Marco Danelutto

In current computing systems, many applications require guarantees on their maximum power consumption to not exceed the available budget. On other hand, for some applications, it could be possible decrease performance, yet maintain an acceptable level, in order reduce consumption. To provide such guarantees, a solution consists changing number of cores assigned application, clock frequency, and placement application threads over cores. However, performance have different trends depending...

10.1145/3004054 article EN ACM Transactions on Architecture and Code Optimization 2016-12-02

D2K

OPENALEX - Publications

Alessio Conte Tiziano De Matteis Daniele De Sensi Roberto P Grossi Andrea Marino and 1 more

This paper studies k-plexes, a well known pseudo-clique model for network communities. In k-plex, each node can miss at most k-1 links. Our goal is to detect large communities in today's real-world graphs which have hundreds of millions edges. While many tried, this task has been elusive so far due its computationally challenging nature: k-plexes and other pseudo-cliques are harder find more numerous than cliques, hard problem. We present D2K, the first algorithm able very just few minutes....

10.1145/3219819.3220093 article EN 2018-07-19

Mitigating network noise on Dragonfly networks through application-aware routing

OPENALEX - Publications

Daniele De Sensi Salvatore Di Girolamo Torsten Hoefler

System noise can negatively impact the performance of HPC systems, and interconnection network is one main factors contributing to this problem. To mitigate effect, adaptive routing sends packets on non-minimal paths if they are less congested. However, while may interference caused by congestion, it also generates more traffic since traverse additional hops, causing in turn congestion other applications application itself. In paper, we first describe how estimate noise. By following these...

10.1145/3295500.3356196 preprint EN 2019-11-07

Predicting Performance and Power Consumption of Parallel Applications

OPENALEX - Publications

Daniele De Sensi

Current architectures provide many control knobs for the reduction of power consumption applications, like reducing number used cores or scaling down their frequency. However, choosing right values these in order to satisfy requirements on performance and/or is a complex task and trying all possible combinations an unfeasible solution since it would require too much time. For this reasons, there need techniques that allow accurate estimation application when specific configuration used....

10.1109/pdp.2016.41 article EN 2016-02-01

Flare

OPENALEX - Publications

Daniele De Sensi Salvatore Di Girolamo Saleh Ashkboos Shigang Li Torsten Hoefler

The allreduce operation is one of the most commonly used communication routines in distributed applications. To improve its bandwidth and to reduce network traffic, this can be accelerated by offloading it switches, that aggregate data received from hosts, send them back aggregated result. However, existing solutions provide limited customization opportunities might suboptimal performance when dealing with custom operators types, sparse data, or reproducibility aggregation a concern. deal...

10.1145/3458817.3476178 preprint EN 2021-10-21

Bringing Parallel Patterns Out of the Corner

OPENALEX - Publications

Daniele De Sensi Tiziano De Matteis Massimo Torquati Gabriele Mencagli Marco Danelutto

High-level parallel programming is an active research topic aimed at promoting methodologies that provide the programmer with high-level abstractions to develop complex software reduced time solution. Pattern-based based on a set of composable and customizable patterns used as basic building blocks in applications. In recent years, considerable effort has been made empowering this model features able overcome shortcomings early approaches concerning flexibility performance. article, we...

10.1145/3132710 article EN ACM Transactions on Architecture and Code Optimization 2017-10-24

Simplifying self-adaptive and power-aware computing with Nornir

OPENALEX - Publications

Daniele De Sensi Tiziano De Matteis Marco Danelutto

10.1016/j.future.2018.05.012 article EN Future Generation Computer Systems 2018-05-15

Mammut: High-level management of system knobs and sensors

OPENALEX - Publications

Daniele De Sensi Massimo Torquati Marco Danelutto

Managing low-level architectural features for controlling performance and power consumption is a growing demand in the parallel computing community. Such include, but are not limited to: energy profiling, platform topology analysis, CPU cores disabling frequency scaling. However, these mechanisms usually managed by specific tools, without any interaction between each other, thus hampering their usability. More important, most existing tools can only be used through command line interface...

10.1016/j.softx.2017.06.005 article EN cc-by SoftwareX 2017-01-01

GASSER: An Auto-Tunable System for General Sliding-Window Streaming Operators on GPUs

OPENALEX - Publications

Tiziano De Matteis Gabriele Mencagli Daniele De Sensi Massimo Torquati Marco Danelutto

Today's stream processing systems handle high-volume data streams in an efficient manner. To achieve this goal, they are designed to scale out on large clusters of commodity machines. However, despite the use distributed architectures, lack support co-processors like graphical units (GPUs) ready accelerate data-parallel tasks. The main reason for integration is that GPU and streaming paradigm have different models, with GPUs needing a bulk present at once while advocates tuple-at-a-time...

10.1109/access.2019.2910312 article EN cc-by-nc-nd IEEE Access 2019-01-01

Performance and Energy Trade-Offs for Parallel Applications on Heterogeneous Multi-Processing Systems

OPENALEX - Publications

A. M. Coutinho Demetrios Daniele De Sensi Arthur F. Lorenzon Kyriakos Georgiou Jose Nunez‐Yanez and 2 more

This work proposes a methodology to find performance and energy trade-offs for parallel applications running on Heterogeneous Multi-Processing systems with single instruction-set architecture. These offer flexibility in the form of different core types voltage frequency pairings, defining vast design space explore. Therefore, given application, choosing configuration that optimizes consumption is not straightforward. Our method novel analytical models power whose parameters can be fitted...

10.3390/en13092409 article EN cc-by Energies 2020-05-11

Noise in the Clouds

OPENALEX - Publications

Daniele De Sensi Tiziano De Matteis Konstantin Taranov Salvatore Di Girolamo Tobias Rahn and 1 more

Cloud computing represents an appealing opportunity for cost-effective deployment of HPC workloads on the best-fitting hardware. However, although cloud and on-premise systems offer similar computational resources, their network architecture performance may differ significantly. For example, these use fundamentally different transport routing protocols, which introduce noise that can eventually limit application scaling. This work analyzes performance, scalability, cost running systems....

10.1145/3570609 article EN Proceedings of the ACM on Measurement and Analysis of Computing Systems 2022-12-01

Simplifying and implementing service level objectives for stream parallelism

OPENALEX - Publications

Dalvan Griebler Adriano Vogel Daniele De Sensi Marco Danelutto Luiz Gustavo Fernandes

10.1007/s11227-019-02914-6 article EN The Journal of Supercomputing 2019-06-05

Truly Scalable K-Truss and Max-Truss Algorithms for Community Detection in Graphs

OPENALEX - Publications

Alessio Conte Daniele De Sensi Roberto Grossi Andrea Marino Luca Versari

The notion of k-truss has been introduced a decade ago in social network analysis and security for community detection, as form cohesive subgraphs less stringent than clique (set pairwise linked nodes), more selective k-core (induced subgraph with minimum degree k). A is an inclusion-maximal H which each edge belongs to at least k - 2 triangles inside H. truss decomposition establishes, e, the maximum e k-truss. Analogously largest k-core, strongest max-truss, corresponds having k. Even...

10.1109/access.2020.3011667 article EN cc-by IEEE Access 2020-01-01

Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects

OPENALEX - Publications

Daniele De Sensi Lorenzo Pichetti Flavio Vella Tiziano De Matteis Zebin Ren and 9 more

10.1109/sc41406.2024.00039 article EN 2024-11-17

Energy Driven Adaptivity in Stream Parallel Computations

OPENALEX - Publications

Marco Danelutto Daniele De Sensi Massimo Torquati

Determining the right amount of resources needed for a given computation is critical problem. In many cases, computing systems are configured to use an manage high load peaks even though this cause energy waste when not fully utilised. To avoid problem, adaptive approaches used dynamically increase/decrease computational depending on real needs. A different approach based Dynamic Voltage and Frequency Scaling (DVFS) emerging as possible alternative solution reduce consumption idle CPUs by...

10.1109/pdp.2015.92 article EN 2015-03-01

P3ARSEC

OPENALEX - Publications

Marco Danelutto Tiziano De Matteis Daniele De Sensi Gabriele Mencagli Massimo Torquati

High-level parallel programming is a de-facto standard approach to develop software with reduced time development. abstractions are provided by existing frameworks as pragma-based annotations in the source code, or through pre-built patterns that recur frequently algorithms, and can be easily instantiated programmer add structure development of software. In this paper we focus on second propose P3ARSEC, benchmark suite for pattern-based consisting representative subset PARSEC applications....

10.1145/3019612.3019745 article EN 2017-04-03

NeVerMore

OPENALEX - Publications

Konstantin Taranov Benjamin Rothenberger Daniele De Sensi Adrian Perrig Torsten Hoefler

This paper presents a security analysis of the InfiniBand architecture, prevalent RDMA standard, and NVMe-over-Fabrics (NVMe-oF), prominent protocol for industrial disaggregated storage that exploits protocols to achieve low-latency high-bandwidth access remote solid-state devices. Our work, NeVerMore, discovers new vulnerabilities in unveils several attack vectors on RDMA-enabled applications NVMe-oF protocol, showing current mechanisms do not address posed by use RDMA. In particular, we...

10.1145/3548606.3560568 article EN Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security 2022-11-07

HammingMesh: A Network Topology for Large-Scale Deep Learning

OPENALEX - Publications

Torsten Hoefler Tommaso Bonato Daniele De Sensi Salvatore Di Girolamo Shigang Li and 5 more

Numerous microarchitectural optimizations unlocked tremendous processing power for deep neural networks that in turn fueled the AI revolution. With exhaustion of such optimizations, growth modern is now gated by performance training systems, especially their data movement. Instead focusing on single accelerators, we investigate data-movement characteristics large-scale at full system scale. Based our workload analysis, design HammingMesh, a novel network topology provides high bandwidth low...

10.1109/sc41404.2022.00016 article EN 2022-11-01

A Power-Aware, Self-Adaptive Macro Data Flow Framework

OPENALEX - Publications

Marco Danelutto Daniele De Sensi Massimo Torquati

The dataflow programming model has been extensively used as an effective solution to implement efficient parallel frameworks. However, the amount of resources allocated runtime support is usually fixed once by programmer or runtime, and kept static during entire execution. While there are cases where such a choice may be appropriate, other scenarios require dynamically change parallelism degree application In this paper we propose algorithm for multicore shared memory platforms, that selects...

10.1142/s0129626417400047 article EN Parallel Processing Letters 2017-03-01

Power‐aware pipelining with automatic concurrency control

OPENALEX - Publications

Massimo Torquati Daniele De Sensi Gabriele Mencagli Marco Aldinucci Marco Danelutto

Summary Continuous streaming computations are usually composed of different modules, exchanging data through shared message queues. The selection the algorithm used to access such queues (ie, concurrency control ) is a critical aspect both for performance and power consumption. In this paper, we describe design automatic implementing power‐efficient communications on shared‐memory multicores. automatically switches between nonblocking blocking protocols, getting best from two worlds, ie,...

10.1002/cpe.4652 article EN Concurrency and Computation Practice and Experience 2018-08-14

Discovering <tex>$k$</tex>-Trusses in Large-Scale Networks

OPENALEX - Publications

Alessio Conte Daniele De Sensi Roberto Grossi Andrea Marino Luca Versari

A k-truss is a subgraph where every edge belongs to at least k-2 triangles in the subgraph. The truss decomposition assigns each maximum k for which k-truss, and trussness of graph among its edges. Discovery algorithms k-trusses provide useful insight analytics (such as community detection). Even though they take polynomial time, on massive networks suffer from handling potentially cubic number wedges: either need long time recompute several times, have high memory usage, or rely large cores...

10.1109/hpec.2018.8547735 article EN 2018-09-01

Coming Soon ...