- Parallel Computing and Optimization Techniques
- Cloud Computing and Resource Management
- Advanced Data Storage Technologies
- Distributed and Parallel Computing Systems
- Algorithms and Data Compression
- Distributed systems and fault tolerance
- Embedded Systems Design Techniques
- Software System Performance and Reliability
- Cloud Data Security Solutions
- Scientific Computing and Data Management
- Security and Verification in Computing
- Green IT and Sustainability
- Advanced Mathematical Identities
- Neural Networks and Applications
- DNA and Biological Computing
- Analog and Mixed-Signal Circuit Design
- Web Data Mining and Analysis
- Tactile and Sensory Interactions
- IoT and Edge/Fog Computing
- Blockchain Technology Applications and Security
- Interactive and Immersive Displays
- Advanced Image and Video Retrieval Techniques
- Catalytic Processes in Materials Science
- Blind Source Separation Techniques
- Statistical and Computational Modeling
University of Potsdam
2015-2022
Hasso Plattner Institute
2008-2022
We propose using spatial gestures not only for input but also output. Analogous to gesture input, the proposed output moves user's finger in a gesture, which user then recognizes. use our concept mobile scenario where motion path forming "5" informs users about new emails, or heart-shaped serves as mes- sage from friend. built two prototypes: (1) The long- RangeOuija is stationary prototype that offers range of up 4cm; (2) pocketOuija self-contained device based on an iPhone with 1cm range....
Data transfers impose a major bottleneck in heterogenous system architectures. As mitigation strategy, compute resources can be introduced places where data occurs naturally. The increased diversity of turn affects programming models and practicalities software development for near-data kernels raises the question how those made accessible to users applications.
GPU compute devices have become very popular for general purpose computations. However, the SIMD-like hardware of graphics processors is currently not well suited irregular workloads, like searching unbalanced trees. In order to mitigate this drawback, NVIDIA introduced an extension programming models called dynamic parallelism. This enables programs spawn new units work directly on GPU, allowing refinement subsequent items based intermediate results without any involvement main CPU....
Cloud federation is receiving increasing attention due to the benefits of resilience and locality it brings cloud providers users. Our analysis three diverse use cases shows that existing solutions are not addressing needs such case applications. In this paper, we present an alternative approach network federation, providing a model based on cloud-to-cloud agreements. our scenarios, companies hosting their own OpenStack clouds need run machines transparently in another cloud, provided by...
For the design and operation of today's computer systems, power energy requirements are highest priorities. Unlike performance analyses, however, measurements heterogeneous systems difficult to conduct. Especially at system-software level, performing remains challenging. Yet, such essential improve software components for low high energy-efficiency.In this paper, we analyze discuss characteristics several with up 20 cores (160 hardware threads) 1 TB main memory. analyzed outline challenges...
Blind Signal Separation is an algorithmic problem class that deals with the restoration of original signal data from a mixture. Implementations, such as Fast ICA, are optimized for parallelization on CPU or first-generation GPU hardware. With advent modern, compute centered hardware powerful features dynamic parallelism support, these solutions no longer leverage available performance in best-possible way. We present implementation FastICA algorithm, which specifically tailored...
For the implementation of data-intensive C++ applications for cache coherent Non-Uniform Memory Access (NUMA) systems, both massive parallelism and data locality have to be considered. While has been largely understood, shared memory paradigm is still deeply entrenched in mindset many software developers. Hence, aspects NUMA systems widely neglected thus far. At first sight, applying nothing approaches might seem like a viable workaround address locality. However, we argue that developers...
The domains of parallel and distributed computing have been converging continuously up to the degree that state-of-the-art server computer systems incorporate characteristics from both domains: They comprise a hierarchy enclosures, where each enclosure houses multiple processor sockets socket again contains memory controllers. A global address space cache coherency are facilitated using layers fast interconnection technologies even across enclosures. growing popularity such creates an urge...
GPU compute devices have become very popular for general purpose computations. However, the SIMD-like hardware of graphics processors is currently not well suited irregular workloads, like searching unbalanced trees. In order to mitigate this drawback, NVIDIA introduced an extension programming models called Dynamic Parallelism. This enables programs spawn new units work directly on GPU, allowing refinement subsequent items based intermediate results without any involvement main CPU.
Cloud computing offers the potential to store, manage, and process data in highly available, scalable, elastic environments. Yet, these environments still provide very limited inflexible means for customers control their data. For example, can neither specify security of inter-cloud communication bearing risk information leakage, nor comply with laws requiring be kept originating jurisdiction, sharing third parties on a fine-granular basis. This lack hinder cloud adoption that falls under...
Abstract Certain workloads such as in‐memory databases are inherently hard to scale‐out and rely on cache‐coherent scale‐up non‐uniform memory access (NUMA) systems keep up with the ever‐increasing demand for compute resources. However, many parallel programming frameworks OpenMP do not make efficient use of large NUMA they consider data locality sufficiently. In this work, we present PGASUS , a C++ framework NUMA‐aware application development that provides integrated facilities task...
The ever-growing demand for computing resources has reached a wide range of application domains. Even though the ubiquitous availability cloud-based GPU instances provides an abundance resources, programmatic complexity utilizing heterogeneous hardware in scale-out scenario is not yet addressed sufficiently. We deal with this issue by introducing CloudCL framework, which enables developers to focus their implementation efforts on compute kernels without having consider inter-node...
The overhead of moving data is the major limiting factor in todays hardware, especially heterogeneous systems where needs to be transferred frequently between host and accelerator memory. With increasing availability hardware-based compression facilities modern computer architectures, this paper investigates potential hardware-accelerated I/O Link Compression as a promising approach reduce volumes transfer time, thus improving overall efficiency accelerators systems. Our considerations are...
The recent restructuring of the electricity grid (i.e., smart grid) introduces a number challenges for today's large-scale computing systems. To operate reliable and efficient, systems must adhere not only to technical limits thermal constraints) but they also reduce operating costs, example, by increasing their energy efficiency. Efforts improve efficiency, however, are often hampered inflexible software components that hardly adapt underlying hardware characteristics. In this paper, we...
In an age of ever-growing data volumes, lossless compression is unarguably one the most relevant techniques to handle vast sets. To facilitate high throughput compression, modern IBM POWER CPUs provide hardware acceleration for proprietary 842 algorithm. The algorithm optimized main memory and a software-based implementation available as part Linux kernel. Even though GPU-equipped computers are vital many todays intensive applications, GPUs have thus far been unable interoperate with...
There is a large amount of information about celebrities spread all over the web hidden inside innumerable news and blogs, pictures on Flickr or videos YouTube. Having this combined aggregated would be great benefit to many customers. In document we will describe architecture (business) value system that not only collates pre-formatted by other services but also provides self-developed named entity recognition algorithm for extracting names from different data sources then processes enriches...
The SSICLOPS consortium recently designed a transparent virtual network expansion mechanism for OpenStack. In this paper, we build up on to propose features improve the interconnection inter-cloud federations. Based distributed in-memory databases and High Energy Physics (HEP) workloads as two representative cloud computing workloads, performed extensive performance evaluations demonstrate that in setup comprised of five sites across Europe, performances our agent are similar legacy VPNaaS...
The ever-growing demand for compute resources has reached a wide range of application domains, and with that created larger audience compute-intensive tasks. In this paper, we present the CloudCL framework, which empowers users to run tasks without having face total cost ownership operating an extensive high-performance infrastructure. enables developers tap ubiquitous availability cloud-based heterogeneous using single-paradigm consider dynamic resource management inter-node communication....
Near-data accelerators play an important role in satisfying the ever growing demand for compute resources. However, efficient integration of near-data computing resources into applications, a flexible programming model and suitable abstractions on operating system level are required. This paper presents Metal FS, framework that enables users applications to orchestrate computations NVMe+FPGA device through standard shell syntax, including pipe operator. A user-space NVMe file interface...
With memory-centric architectures appearing on the horizon as potential candidates for future computer architectures, we propose that tuple space paradigm is well suited task of managing large shared memory pools are a central concept these new architectures. We support this hypothesis by presenting MemSpaces, an implementation based POSIX objects. To demonstrate both efficacy and efficiency approach, provide performance evaluation compares MemSpaces to message-based implementations...