Salvatore Di Girolamo

ORCID: 0000-0003-2197-8860
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Interconnection Networks and Systems
  • Software-Defined Networks and 5G
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Cloud Computing and Resource Management
  • Photonic and Optical Devices
  • Photorefractive and Nonlinear Optics
  • Advanced Optical Network Technologies
  • Advanced Fiber Laser Technologies
  • Caching and Content Delivery
  • Advanced Fiber Optic Sensors
  • Advanced Memory and Neural Computing
  • Distributed and Parallel Computing Systems
  • Complex Network Analysis Techniques
  • Semiconductor Lasers and Optical Devices
  • Software System Performance and Reliability
  • Graph Theory and Algorithms
  • Network Packet Processing and Optimization
  • Peer-to-Peer Network Technologies
  • Network Traffic and Congestion Control
  • Nuclear Physics and Applications
  • IoT and Edge/Fog Computing
  • Scientific Computing and Data Management
  • SARS-CoV-2 and COVID-19 Research
  • Advanced Graph Neural Networks

ETH Zurich
2015-2024

Tamedia (Switzerland)
2024

Zürcher Fachhochschule
2022

Technical University of Darmstadt
2022

University of Illinois Urbana-Champaign
2022

Indian Institute of Technology Kanpur
2022

Università della Svizzera italiana
2021

Board of the Swiss Federal Institutes of Technology
2017

University of Pisa
2015-2016

University of Eastern Finland
2006-2013

Currently major efforts are underway toward refining the horizontal resolution (or grid spacing) of climate models to about 1 km, using both global and regional (GCMs RCMs). Several groups have succeeded in conducting kilometer-scale multiweek GCM simulations decadelong continental-scale RCM simulations. There is well-founded hope that this increase represents a quantum jump modeling, as it enables replacing parameterization moist convection by an explicit treatment. It expected will improve...

10.1175/bams-d-18-0167.1 article EN Bulletin of the American Meteorological Society 2019-10-25

Neutralizing antibodies that target the receptor-binding domain (RBD) of SARS-CoV-2 spike protein are among most promising approaches against COVID-191,2. A bispecific IgG1-like molecule (CoV-X2) has been developed on basis C121 and C135, two derived from donors who had recovered COVID-193. Here we show CoV-X2 simultaneously binds independent sites RBD and, unlike its parental antibodies, prevents detectable binding to cellular receptor virus, angiotensin-converting enzyme 2 (ACE2)....

10.1038/s41586-021-03461-y article EN other-oa Nature 2021-03-25

The interconnect is one of the most critical components in large scale computing systems, and its impact on performance applications going to increase with system size. In this paper, we will describe SLINGSHOT, an interconnection network for systems. SLINGSHOT based high-radix switches, which allow building exascale hyper-scale datacenters networks at three switch-to-switch hops. Moreover, provides efficient adaptive routing congestion control algorithms, highly tunable traffic classes....

10.1109/sc41405.2020.00039 preprint EN 2020-11-01

Simple graph algorithms such as PageRank have been the target of numerous hardware accelerators. Yet, there also exist much more complex mining for problems clustering or maximal clique listing. These are memory-bound and thus could be accelerated by techniques Processing-in-Memory (PIM). However, they come with non-straightforward parallelism complicated memory access patterns. In this work, we address problem a simple yet surprisingly powerful observation: operations on sets vertices,...

10.1145/3466752.3480133 article EN 2021-10-17

Optimizing communication performance is imperative for large-scale computing because overheads limit the strong scalability of parallel applications. Today's network cards contain rather powerful processors optimized data movement. However, these devices are limited to fixed functions, such as remote direct memory access. We develop sPIN, a portable programming model offload simple packet processing functions card. To demonstrate potential model, we design cycle-accurate simulation...

10.1145/3126908.3126970 article EN 2017-11-08

The recent line of research into topology design focuses on lowering network diameter. Many low-diameter topologies such as Slim Fly or Jellyfish that substantially reduce cost, power consumption, and latency have been proposed. A key challenge in realizing the benefits these is routing. On one hand, networks provide shorter path lengths than established Clos torus, leading to performance improvements. other number shortest paths between each pair endpoints much smaller Clos, but there a...

10.1109/tpds.2020.3035761 article EN IEEE Transactions on Parallel and Distributed Systems 2020-11-04

Load imbalance pervasively exists in distributed deep learning training systems, either caused by the inherent learned tasks or system itself. Traditional synchronous Stochastic Gradient Descent (SGD) achieves good accuracy for a wide variety of tasks, but relies on global synchronization to accumulate gradients at every step. In this paper, we propose eager-SGD, which relaxes decentralized accumulation. To implement use two partial collectives: solo and majority. With allreduce, faster...

10.1145/3332466.3374528 preprint EN 2020-02-19

The allreduce operation is one of the most commonly used communication routines in distributed applications. To improve its bandwidth and to reduce network traffic, this can be accelerated by offloading it switches, that aggregate data received from hosts, send them back aggregated result. However, existing solutions provide limited customization opportunities might suboptimal performance when dealing with custom operators types, sparse data, or reproducibility aggregation a concern. deal...

10.1145/3458817.3476178 preprint EN 2021-10-21

System noise can negatively impact the performance of HPC systems, and interconnection network is one main factors contributing to this problem. To mitigate effect, adaptive routing sends packets on non-minimal paths if they are less congested. However, while may interference caused by congestion, it also generates more traffic since traverse additional hops, causing in turn congestion other applications application itself. In paper, we first describe how estimate noise. By following these...

10.1145/3295500.3356196 preprint EN 2019-11-07

The capacity of offloading data and control tasks to the network is becoming increasingly important, especially if we consider faster growth speed when compared CPU frequencies. In-network compute alleviates host load by running directly in network, enabling additional computation/communication overlap potentially improving overall application performance. However, sustaining bandwidths provided next-generation networks, e.g., 400 Gbit/s, can become a challenge. sPIN programming model for...

10.1109/isca52012.2021.00079 article EN 2021-06-01

We present an adaptive interferometer based on the reflection dynamic hologram recorded in photorefractive CdTe:V crystal with no external electric field. Linear phase-to-intensity transformation is achieved by vectorial mixing of two waves different polarization states (linear and elliptical) anisotropic diffraction geometry. Comparison transmission geometries considering both sensitivity adaptability carried out. It shown that geometry characterized better combination these parameters...

10.1364/oe.15.000545 article EN cc-by Optics Express 2007-01-22

We introduce FatPaths: a simple, generic, and robust routing architecture that enables state-of-the-art low-diameter topologies such as Slim Fly to achieve unprecedented performance. FatPaths targets Ethernet stacks in both HPC supercomputers well cloud data centers clusters. exposes exploits the rich ("fat") diversity of minimal non-minimal paths for high-performance multi-pathing. Moreover, uses redesigned "purified" transport layer removes virtually all TCP performance issues (e.g., slow...

10.1109/sc41405.2020.00031 article EN 2020-11-01

Distributed memory systems are becoming increasingly important since they provide a system-scale abstraction where physically separated memories can be addressed as single logical one. This enables disaggregation, allowing in-memory databases, caching services, and ephemeral storage to naturally deployed at large scales. While this effectively increases the capacity of these systems, it faces additional overheads for remote accesses. To narrow difference between local accesses, low latency...

10.1145/3448016.3452817 article EN Proceedings of the 2022 International Conference on Management of Data 2021-06-09

Network interface cards are one of the key components to achieve efficient parallel performance. In past, they have gained new functionalities such as lossless transmissionand remote direct memory access that now ubiquitous in high-performance systems. Prototypes next generation network offer features facilitate device programming. this work, various possible uses offload explored. We use Portals 4 specification an example demonstrate techniques fully asynchronous, multi-schedule and solo...

10.1109/hoti.2015.21 article EN 2015-08-01

We analyze vectorial wave mixing in a photorefractive crystal of cubic symmetry different geometries beam interactions--reflection, transmission, and orthogonal. It is shown that orthogonal geometry contrast with others supports an efficient phase demodulation depolarized object linear mode without using any polarization-filtering elements. As result adaptive interferometers based on the can provide higher signal-to-noise ratio due to lower noise optical losses.

10.1364/josab.27.000311 article EN Journal of the Optical Society of America B 2010-01-22

Summary The emergence of real‐time decision‐making applications in domains like high‐frequency trading, emergency management, and service level analysis communication networks has led to the definition new classes queries. Skyline queries are a notable example. Their results consist all tuples whose attribute vector is not dominated (in Pareto sense) by one any other tuple. Because their popularity, skyline have been studied terms both sequential algorithms parallel implementations for...

10.1002/cpe.3866 article EN Concurrency and Computation Practice and Experience 2016-05-19

Cloud computing represents an appealing opportunity for cost-effective deployment of HPC workloads on the best-fitting hardware. However, although cloud and on-premise systems offer similar computational resources, their network architecture performance may differ significantly. For example, these use fundamentally different transport routing protocols, which introduce noise that can eventually limit application scaling. This work analyzes performance, scalability, cost running systems....

10.1145/3570609 article EN Proceedings of the ACM on Measurement and Analysis of Computing Systems 2022-12-01

We present a strain sensor in which multimode fiber is used as sensitive element. High sensitivity to dynamic strains achieved by means of vectorial wave mixing photorefractive CdTe:V crystal. It was found that the largest source noise our related instability polarization state speckles emerging from fiber. This significantly diminished with core large diameter (550 microm).

10.1364/ol.32.001821 article EN Optics Letters 2007-06-20

Applications often communicate data that is non-contiguous in the send- or receive-buffer, e.g., when exchanging a column of matrix stored row-major order. While transfers are well supported HPC (e.g., MPI derived datatypes), they can still be up to 5x slower than contiguous same size. As we enter era network acceleration, need investigate which tasks offload NIC: In this work argue memory transparently networkaccelerated, truly achieving zero-copy communications. We implement and extend...

10.1145/3295500.3356189 preprint EN 2019-11-07

Numerous microarchitectural optimizations unlocked tremendous processing power for deep neural networks that in turn fueled the AI revolution. With exhaustion of such optimizations, growth modern is now gated by performance training systems, especially their data movement. Instead focusing on single accelerators, we investigate data-movement characteristics large-scale at full system scale. Based our workload analysis, design HammingMesh, a novel network topology provides high bandwidth low...

10.1109/sc41404.2022.00016 article EN 2022-11-01

Optimizing communication performance is imperative for large-scale computing because overheads limit the strong scalability of parallel applications. Today's network cards contain rather powerful processors optimized data movement. However, these devices are limited to fixed functions, such as remote direct memory access. We develop sPIN, a portable programming model offload simple packet processing functions card. To demonstrate potential model, we design cycle-accurate simulation...

10.48550/arxiv.1709.05483 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Network interface cards are one of the key components to achieve efficient parallel performance. In past, they have gained new functionalities, such as lossless transmission and remote direct memory access, that now ubiquitous in high-performance systems. Prototypes next-generation network offer features facilitate device programming. this article, authors discuss an abstract machine model for offloading architectures. They used Portals 4 implement proposed abstraction model, present two...

10.1109/mm.2016.56 article EN IEEE Micro 2016-07-01
Coming Soon ...