- Parallel Computing and Optimization Techniques
- Distributed and Parallel Computing Systems
- Distributed systems and fault tolerance
- Advanced Data Storage Technologies
- Interconnection Networks and Systems
- Microgrid Control and Optimization
- Real-time simulation and control systems
- Optimization and Search Problems
- Network Time Synchronization Technologies
- Experimental Learning in Engineering
- Graph Theory and Algorithms
- Smart Grid Energy Management
- Energy Harvesting in Wireless Networks
- Cloud Computing and Resource Management
Friedrich Schiller University Jena
2023
Google (United States)
2016
Stanford University
1989-2003
Ford Motor Company (United States)
1998
The overall goals and major features of the directory architecture for shared memory (Dash) are presented. fundamental premise behind is that it possible to build a scalable high-performance machine with single address space coherent caches. Dash in achieves linear or near-linear performance growth as number processors increases from few thousand. This results distributing among processing nodes using network bandwidth connect nodes. allows data be cached, significantly reducing latency...
A fundamental problem that any scalable multiprocessor must address is the ability to tolerate high latency memory operations. This paper explores extent which multiple hardware contexts per processor can help mitigate negative effects of latency. In particular, we evaluate performance a directory-based cache coherent using reference traces obtained from three parallel applications. We explore case where there are small fixed number (2-4) and context switch overhead low. contrast previously...
The cache invalidation patterns of several parallel applications are analyzed. results based on multiprocessor simulations with 8, 16, and 32 processors. To provide deeper insight into the observed behavior invalidations in linked to high-level objects causing them programs. predict what would look like beyond processors, a classification scheme for data found programs is proposed. provides powerful conceptual tool reason about applications. Results indicate that it should be possible scale...
To make shared-memory multiprocessors scalable, researchers are now exploring cache coherence protocols that do not rely on broadcast, but instead send invalidation messages to individual caches contain stale data. The feasibility of such directory-based is highly sensitive the patterns parallel programs exhibit. In this paper, we analyze caused by several applications and investigate effect these a protocol. Our results based multiprocessor traces with 4, 8 16 processors. gain insight into...
To make shared-memory multiprocessors scalable, researchers are now exploring cache coherence protocols that do not rely on broadcast, but instead send invalidation messages to individual caches contain stale data. The feasibility of such directory-based is highly sensitive the patterns parallel programs exhibit. In this paper, we analyze caused by several applications and investigate effect these a protocol. Our results based multiprocessor traces with 4, 8 16 processors. gain insight into...
The authors present and analyze algorithms for managing the distributed shared memory in nonuniform-memory-access multiprocessors related systems. competitive properties of these guarantee that their performance is within a small constant factor optimal even though they make no use any information about reference patterns. Both hardware software implementation concerns are covered. A case study Mach operating system indicates integration into systems does not pose major problems. On other...
A fundamental problem that any scalable multiprocessor must address is the ability to tolerate high latency memory operations. This paper explores extent which multiple hardware contexts per processor can help mitigate negative effects of latency. In particular, we evaluate performance a directory-based cache coherent using reference traces obtained from three parallel applications. We explore case where there are small fixed number (2-4) and context switch overhead low. contrast previously...
The versatile hardware-in-the-loop laboratory can aid large system development by refining the models of individual components during integration. Selection hardware, software and architecture is discussed. Some general notes on facility are given.
We propose Powernet as an end-to-end open source technology for economically efficient, scalable and secure coordination of grid resources. It offers integrated hardware software solutions that are judiciously divided between local embedded sensing, computing control, which networked with cloud-based high-level real-time optimal operations not only centralized but also millions distributed resources various types. Our goal is to enable penetration 50% or higher intermittent renewables while...
Peachy Parallel Assignments are model assignments for teaching parallel computing concepts. They competitively selected being adoptable by other instructors and "cool inspirational" students. Thus, they allow to easily add high-quality that will engage students their classes.