- Parallel Computing and Optimization Techniques
- Advanced Memory and Neural Computing
- Embedded Systems Design Techniques
- Advanced Neural Network Applications
- Adversarial Robustness in Machine Learning
- Interconnection Networks and Systems
- Particle physics theoretical and experimental studies
- CCD and CMOS Imaging Sensors
- Particle Detector Development and Performance
- Radiation Effects in Electronics
- Hand Gesture Recognition Systems
- Experimental Learning in Engineering
- High-Energy Particle Collisions Research
- Human Pose and Action Recognition
- Handwritten Text Recognition Techniques
- Industrial Vision Systems and Defect Detection
- Anomaly Detection Techniques and Applications
- Distributed and Parallel Computing Systems
- VLSI and Analog Circuit Testing
- Real-Time Systems Scheduling
- Graphene research and applications
- Context-Aware Activity Recognition Systems
- Mechatronics Education and Applications
- Stochastic Gradient Optimization Techniques
- Security and Verification in Computing
Karlsruhe Institute of Technology
2013-2024
Institut für Informationsverarbeitung
2019-2024
Rutherford Appleton Laboratory
2016
Max Planck Institute for Nuclear Physics
2015-2016
FZI Research Center for Information Technology
2011
A new tracking system is under development for operation in the CMS experiment at High Luminosity LHC. It includes an outer tracker which will construct stubs, built by correlating clusters two closely spaced sensor layers rejection of hits from low transverse momentum tracks, and transmit them off-detector 40 MHz. If data to contribute keeping Level-1 trigger rate around 750 kHz increased luminosity, a crucial component upgrade be ability identify tracks with above 3 GeV/c building out...
AES (Advanced Encryption Standard) accelerators are commonly used in high-throughput applications, but they have notable resource requirements. We investigate replacing the cipher with ChaCha ciphers and propose first FPGA implementations optimized for data throughput. In consequence, we compare of three different system architectures analyze which aspects dominate performance those.Our experimental results indicate that a bandwidth 175 Gbit/s can be reached as little 2982 slices, whereas...
Modern high-energy physics experiments such as the Compact Muon Solenoid experiment at CERN produce an extraordinary amount of data every 25ns. To handle a rate more than 50Tbit/s multi-level trigger system is required, which reduces rate. Due to increased luminosity after Phase-II-Upgrade LHC, CMS tracking has be redesigned. The current unable resulting this upgrade. Because latency few microseconds Level 1 Track Trigger implemented in hardware. State-of-the-art pattern recognition filter...
Enabling the use of Deep Neural Networks (DNNs) for time-series-based applications on low-power devices such as wearables opens up a wide range new features and services. However, inference requires an enormous amount operations to be performed by computing platform. In addition, Long Short-Term Memory (LSTM)-based networks require memory store internal cell state future calculations. this paper, we therefore propose hardware/software co-design based LSTM hardware accelerator architecture...
Since their breakthrough, complexity of Deep Neural Networks (DNNs) is rising steadily. As a result, accelerators for DNNs are now used in many domains. However, designing and configuring an accelerator that meets the requirements given application perfectly challenging task. In this paper, we therefore present our approach to support design process. With analytical model systolic array can estimate performance, energy consumption area each option. To determine these metrics, usually cycle...
For AI-based systems in safety-critical domains, it is inevitable to understand the impact of random hardware faults affecting target accelerators. The high degree data reuse makes Deep Neural Network (DNN) accelerators susceptible significant fault propagation and hence hazardous predictions. Therefore, we present SiFI-AI, a simulation framework for injection DNN SiFI-AI proposes hybrid approach combining fast AI inference with cycle-accurate RTL simulation. Time-expensive only used...
Time series-based applications such as recognition of handwriting benefit from using Deep Neural Networks (DNNs) in terms accuracy and efficiency. Due to strict power memory limitations embedded platforms the Internet-of-Things (IoT), inference DNNs is usually performed on more powerful less constrained devices. However, mobile devices smartphones or tablets leads high system requirements. In this paper, we present our approach for distributing computational workload between sensor pen a...
In the future, it is expected that safety-critical and non-critical applications are executed on same hardware. Therefore, future hardware systems should be capable of providing runtime support for higher reliability requirements performance noncritical equally. this paper, we present a run-time adaptive cache with coarse-grained safety mechanism to tackle emerging challenge. For applications, operates in mode without any mechanisms. On other hand, checkpointing rollback feature fault...
ZuSE-KI-Mobil (ZuKIMo) is a nationally funded research project, currently in its intermediate stage. The goal of the ZuKIMo project to develop new System-on-Chip (SoC) platform and corresponding ecosystem enable efficient Artificial Intelligence (AI) applications with specific requirements. With ZuKIMo, we specifically target from mobility domain, i.e. autonomous vehicles drones. initial built by consortium consisting seven partners German academia industry. We SoC around novel AI...
Applications of different criticality sharing the same System-on-Chip (SoC) platform are increasing in popularity to reduce overall cost. Spatial and temporal isolation techniques utilized inter application influence ensure real-time requirements met. involves partitioning communication resources such partitions can result irregular topologies. It is desirable that on-chip interconnect on systems support within all possible partition shapes using efficient routing techniques. To improve...
Modern computer architectures have an ever-increasing demand for performance, but are constrained in power dissipation and chip area. To tackle these demands, with application-specific accelerators gained traction research industry. While this is a very promising direction, hard-wired fall short when too many applications need to be supported or flexibility required. In paper, we propose automatic loop detection hardware acceleration approach adaptive reconfigurable processor. Our...
Embedded image processing applications like multicamera-based object detection or semantic segmentation are often based on Convolutional Neural Networks (CNNs) to provide precise and reliable results. The deployment of CNNs in embedded systems, however, imposes additional constraints such as latency restrictions limited energy consumption the sensor platform. These requirements have be considered during hardware/software co-design Artifical Intelligence (AI) applications. In addition,...
The CMS collaboration is preparing a major upgrade of its detector, so it can operate during the high luminosity run LHC from 2026. upgraded tracker electronics will reconstruct trajectories charged particles within latency few microseconds, that they be used by level-1 trigger. An emulation framework, CIDAF, has been developed to provide reference for proposed FPGA-based implementation this track finder, which employs Time-Multiplexed (TM) technique data processing.
Distributed systems can be found in various applications, e.g., robotics or autonomous driving, to achieve higher flexibility and robustness. Thereby, data flow centric applications such as Deep Neural Network (DNN) inference benefit from partitioning the workload over multiple compute nodes terms of performance energy-efficiency. However, mapping large models on distributed embedded is a complex task, due low latency high throughput requirements combined with strict energy memory...