- Interconnection Networks and Systems
- Software-Defined Networks and 5G
- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Cloud Computing and Resource Management
- Photonic and Optical Devices
- Photorefractive and Nonlinear Optics
- Advanced Optical Network Technologies
- Advanced Fiber Laser Technologies
- Caching and Content Delivery
- Advanced Fiber Optic Sensors
- Advanced Memory and Neural Computing
- Distributed and Parallel Computing Systems
- Complex Network Analysis Techniques
- Semiconductor Lasers and Optical Devices
- Software System Performance and Reliability
- Graph Theory and Algorithms
- Network Packet Processing and Optimization
- Peer-to-Peer Network Technologies
- Network Traffic and Congestion Control
- Nuclear Physics and Applications
- IoT and Edge/Fog Computing
- Scientific Computing and Data Management
- SARS-CoV-2 and COVID-19 Research
- Advanced Graph Neural Networks
ETH Zurich
2015-2024
Tamedia (Switzerland)
2024
Zürcher Fachhochschule
2022
Technical University of Darmstadt
2022
University of Illinois Urbana-Champaign
2022
Indian Institute of Technology Kanpur
2022
Università della Svizzera italiana
2021
Board of the Swiss Federal Institutes of Technology
2017
University of Pisa
2015-2016
University of Eastern Finland
2006-2013
Currently major efforts are underway toward refining the horizontal resolution (or grid spacing) of climate models to about 1 km, using both global and regional (GCMs RCMs). Several groups have succeeded in conducting kilometer-scale multiweek GCM simulations decadelong continental-scale RCM simulations. There is well-founded hope that this increase represents a quantum jump modeling, as it enables replacing parameterization moist convection by an explicit treatment. It expected will improve...
Neutralizing antibodies that target the receptor-binding domain (RBD) of SARS-CoV-2 spike protein are among most promising approaches against COVID-191,2. A bispecific IgG1-like molecule (CoV-X2) has been developed on basis C121 and C135, two derived from donors who had recovered COVID-193. Here we show CoV-X2 simultaneously binds independent sites RBD and, unlike its parental antibodies, prevents detectable binding to cellular receptor virus, angiotensin-converting enzyme 2 (ACE2)....
The interconnect is one of the most critical components in large scale computing systems, and its impact on performance applications going to increase with system size. In this paper, we will describe SLINGSHOT, an interconnection network for systems. SLINGSHOT based high-radix switches, which allow building exascale hyper-scale datacenters networks at three switch-to-switch hops. Moreover, provides efficient adaptive routing congestion control algorithms, highly tunable traffic classes....
Simple graph algorithms such as PageRank have been the target of numerous hardware accelerators. Yet, there also exist much more complex mining for problems clustering or maximal clique listing. These are memory-bound and thus could be accelerated by techniques Processing-in-Memory (PIM). However, they come with non-straightforward parallelism complicated memory access patterns. In this work, we address problem a simple yet surprisingly powerful observation: operations on sets vertices,...
Optimizing communication performance is imperative for large-scale computing because overheads limit the strong scalability of parallel applications. Today's network cards contain rather powerful processors optimized data movement. However, these devices are limited to fixed functions, such as remote direct memory access. We develop sPIN, a portable programming model offload simple packet processing functions card. To demonstrate potential model, we design cycle-accurate simulation...
The recent line of research into topology design focuses on lowering network diameter. Many low-diameter topologies such as Slim Fly or Jellyfish that substantially reduce cost, power consumption, and latency have been proposed. A key challenge in realizing the benefits these is routing. On one hand, networks provide shorter path lengths than established Clos torus, leading to performance improvements. other number shortest paths between each pair endpoints much smaller Clos, but there a...
Load imbalance pervasively exists in distributed deep learning training systems, either caused by the inherent learned tasks or system itself. Traditional synchronous Stochastic Gradient Descent (SGD) achieves good accuracy for a wide variety of tasks, but relies on global synchronization to accumulate gradients at every step. In this paper, we propose eager-SGD, which relaxes decentralized accumulation. To implement use two partial collectives: solo and majority. With allreduce, faster...
The allreduce operation is one of the most commonly used communication routines in distributed applications. To improve its bandwidth and to reduce network traffic, this can be accelerated by offloading it switches, that aggregate data received from hosts, send them back aggregated result. However, existing solutions provide limited customization opportunities might suboptimal performance when dealing with custom operators types, sparse data, or reproducibility aggregation a concern. deal...
System noise can negatively impact the performance of HPC systems, and interconnection network is one main factors contributing to this problem. To mitigate effect, adaptive routing sends packets on non-minimal paths if they are less congested. However, while may interference caused by congestion, it also generates more traffic since traverse additional hops, causing in turn congestion other applications application itself. In paper, we first describe how estimate noise. By following these...
The capacity of offloading data and control tasks to the network is becoming increasingly important, especially if we consider faster growth speed when compared CPU frequencies. In-network compute alleviates host load by running directly in network, enabling additional computation/communication overlap potentially improving overall application performance. However, sustaining bandwidths provided next-generation networks, e.g., 400 Gbit/s, can become a challenge. sPIN programming model for...
We present an adaptive interferometer based on the reflection dynamic hologram recorded in photorefractive CdTe:V crystal with no external electric field. Linear phase-to-intensity transformation is achieved by vectorial mixing of two waves different polarization states (linear and elliptical) anisotropic diffraction geometry. Comparison transmission geometries considering both sensitivity adaptability carried out. It shown that geometry characterized better combination these parameters...
We introduce FatPaths: a simple, generic, and robust routing architecture that enables state-of-the-art low-diameter topologies such as Slim Fly to achieve unprecedented performance. FatPaths targets Ethernet stacks in both HPC supercomputers well cloud data centers clusters. exposes exploits the rich ("fat") diversity of minimal non-minimal paths for high-performance multi-pathing. Moreover, uses redesigned "purified" transport layer removes virtually all TCP performance issues (e.g., slow...
Distributed memory systems are becoming increasingly important since they provide a system-scale abstraction where physically separated memories can be addressed as single logical one. This enables disaggregation, allowing in-memory databases, caching services, and ephemeral storage to naturally deployed at large scales. While this effectively increases the capacity of these systems, it faces additional overheads for remote accesses. To narrow difference between local accesses, low latency...
Network interface cards are one of the key components to achieve efficient parallel performance. In past, they have gained new functionalities such as lossless transmissionand remote direct memory access that now ubiquitous in high-performance systems. Prototypes next generation network offer features facilitate device programming. this work, various possible uses offload explored. We use Portals 4 specification an example demonstrate techniques fully asynchronous, multi-schedule and solo...
We analyze vectorial wave mixing in a photorefractive crystal of cubic symmetry different geometries beam interactions--reflection, transmission, and orthogonal. It is shown that orthogonal geometry contrast with others supports an efficient phase demodulation depolarized object linear mode without using any polarization-filtering elements. As result adaptive interferometers based on the can provide higher signal-to-noise ratio due to lower noise optical losses.
Summary The emergence of real‐time decision‐making applications in domains like high‐frequency trading, emergency management, and service level analysis communication networks has led to the definition new classes queries. Skyline queries are a notable example. Their results consist all tuples whose attribute vector is not dominated (in Pareto sense) by one any other tuple. Because their popularity, skyline have been studied terms both sequential algorithms parallel implementations for...
Cloud computing represents an appealing opportunity for cost-effective deployment of HPC workloads on the best-fitting hardware. However, although cloud and on-premise systems offer similar computational resources, their network architecture performance may differ significantly. For example, these use fundamentally different transport routing protocols, which introduce noise that can eventually limit application scaling. This work analyzes performance, scalability, cost running systems....
We present a strain sensor in which multimode fiber is used as sensitive element. High sensitivity to dynamic strains achieved by means of vectorial wave mixing photorefractive CdTe:V crystal. It was found that the largest source noise our related instability polarization state speckles emerging from fiber. This significantly diminished with core large diameter (550 microm).
Applications often communicate data that is non-contiguous in the send- or receive-buffer, e.g., when exchanging a column of matrix stored row-major order. While transfers are well supported HPC (e.g., MPI derived datatypes), they can still be up to 5x slower than contiguous same size. As we enter era network acceleration, need investigate which tasks offload NIC: In this work argue memory transparently networkaccelerated, truly achieving zero-copy communications. We implement and extend...
Numerous microarchitectural optimizations unlocked tremendous processing power for deep neural networks that in turn fueled the AI revolution. With exhaustion of such optimizations, growth modern is now gated by performance training systems, especially their data movement. Instead focusing on single accelerators, we investigate data-movement characteristics large-scale at full system scale. Based our workload analysis, design HammingMesh, a novel network topology provides high bandwidth low...
Optimizing communication performance is imperative for large-scale computing because overheads limit the strong scalability of parallel applications. Today's network cards contain rather powerful processors optimized data movement. However, these devices are limited to fixed functions, such as remote direct memory access. We develop sPIN, a portable programming model offload simple packet processing functions card. To demonstrate potential model, we design cycle-accurate simulation...
Network interface cards are one of the key components to achieve efficient parallel performance. In past, they have gained new functionalities, such as lossless transmission and remote direct memory access, that now ubiquitous in high-performance systems. Prototypes next-generation network offer features facilitate device programming. this article, authors discuss an abstract machine model for offloading architectures. They used Portals 4 implement proposed abstraction model, present two...