- Embedded Systems Design Techniques
- Interconnection Networks and Systems
- Parallel Computing and Optimization Techniques
- Real-Time Systems Scheduling
- Advanced Memory and Neural Computing
- CCD and CMOS Imaging Sensors
- Ferroelectric and Negative Capacitance Devices
- VLSI and FPGA Design Techniques
- Advanced Neural Network Applications
- Neuroscience and Neural Engineering
- Simulation Techniques and Applications
- Distributed and Parallel Computing Systems
- Distributed systems and fault tolerance
- Rural Development and Agriculture
- Formal Methods in Verification
- Neural dynamics and brain function
- Urban Development and Societal Issues
- Domain Adaptation and Few-Shot Learning
- Machine Learning and ELM
- Linguistics and Education Research
- Financial Risk and Volatility Modeling
- Fern and Epiphyte Biology
- Statistical Distribution Estimation and Applications
- Context-Aware Activity Recognition Systems
- Infrastructure Maintenance and Monitoring
Snap (United States)
2024
Mathys (Netherlands)
2020-2023
Universidade de Brasília
2021
Intel (United States)
2015-2017
Eindhoven University of Technology
2014-2017
Ericsson (Netherlands)
2009-2015
Intel (United Kingdom)
2015
Ericsson (Sweden)
2011-2014
Urbana University
2010
Universidade Federal de São Carlos
2010
This paper proposes a scheduling strategy and an automatic flow that enable the simultaneous execution of multiple hard-real-time dataflow jobs. Each job has its own rate starts stops independently from other jobs, at instants unknown compile-time, on multiprocessor system-on-chip. We show how combination Time-Division Multiplex (TDM) static-order can be modeled as additional nodes edges top representation using Single-Rate Dataflow semantics to tight worst-case temporal analysis. also...
We propose an online resource allocation solution for multiprocessor systems-on-chip, that executes several real-time, streaming media jobs simultaneously. The system consists of up to 24 processors connected by AEthereal [7] Network-on-Chip (NoC) 4 12 routers. A job is a set processing tasks FIFO channels. Each can be independently started or stopped the user. annotated with budgets per computation task and communication channel which have been computed at compile-time. When requested...
Single-Rate Data-Flow (SRDF) graphs, also known as Homogeneous Synchronous (HSDF) graphs or Marked Graphs, are often used to model the implementation and do temporal analysis of concurrent DSP multimedia applications. An important problem in implementing applications expressed SRDF is computation minimal amount buffering needed implement a static periodic schedule (SPS) that optimal terms execution rate, throughput. Ning Gao [1] propose linear-programming-based polynomial algorithm compute...
Energy efficient execution of applications is important for many reasons, e.g. time between battery charges, device temperature. Voltage and Frequency Scaling (VFS) enables to be run at lower frequencies on hardware resources thereby consuming less power. Real-time have deadlines that must met otherwise their output devalued. Dataflow modelling real-time off-line verification the application's temporal requirements. In this paper we describe a method reduce combined static dynamic energy...
On a multi-radio baseband system, multiple independent transceivers must share the resources of multi-processor, while meeting each its own hard real-time requirements. Not all possible combinations are known at compile time, so solution be found that either allows for timing analysis or relies on runtime analysis. This thesis proposes design flow and software architecture meets these challenges, enabling features such as transceiver compilation dynamic loading, taking into account other...
Inference of Deep Neural Networks for stream signal (Video/Audio) processing in edge devices is still challenging. Unlike the most state art inference engines which are efficient static signals, our brain optimized real-time dynamic processing. We believe one important feature (asynchronous state-full processing) key to its excellence this domain. In work, we show how asynchronous with neurons allows exploitation existing sparsity natural signals. This paper explains three different types...
Biological neurons are known to have sparse and asynchronous communications using spikes. Despite our incomplete understanding of processing strategies the brain, its low energy consumption in fulfilling delicate tasks suggests existence efficient mechanisms. Inspired by these key factors, we introduce SpArNet, a bio-inspired quantization scheme convert pre-trained convolutional neural network spiking network, with aim minimizing computational load for execution on neuromorphic processors....
Neuronflow is a neuromorphic, many core, data flow architecture that exploits brain-inspired concepts to deliver scalable event-based processing engine for neuron networks in Live AI applications. Its design inspired by brain biology, but not necessarily biologically plausible. The main goal the exploitation of sparsity dramatically reduce latency and power consumption as required sensor at Edge.
Contemporary embedded systems for wireless communications support various radios. A software-defined radio (SDR) is a implemented as concurrent software processes that typically run on multiprocessor system-on-chip (MPSoC). SDRs are real-time streaming applications with throughput requirements. One efficient approach timing analysis of the dataflow model computation (MoC). Nonetheless, modeling challenging due to their dynamically changing data processing workload. MoC not expressive enough...
Brain-inspired event-driven processors execute deep neural networks (DNNs) in a sparsity-aware manner, leading to superior performance compared conventional platforms. In the pursuit of higher event sparsity, prior studies suppress non-zero events by either eliminating intra-frame activations (spatially) or leveraging redundancy inter-frame differences for video (temporally). However, we have empirically observed that simultaneously enhancing activation and temporal sparsity can lead...
This paper proposes a new data ow model for analyzing the worst-case temporal behavior of resource arbitration through Time Division Multiplexing (TDM).
Wireless embedded applications have stringent temporal constraints. The frame arrival rate imposes a throughput requirement that must be satisfied. These are often dynamic and streaming in nature. FSM-based Scenario-Aware Dataflow (FSM-SADF) model of computation (MoC) has been proposed to such applications. FSM-SADF splits system into set static modes operation, called scenarios. Each scenario is modeled by Synchronous (SDF) graph. possible transitions specified finite-state machine (FSM)....
Voltage and Frequency Scaling (VFS) can effectively reduce energy consumption at system level. Most work in this field has focused on deadline-constrained applications with finite schedule lengths. However, typical real-time streaming, processing is repeatedly activated by indefinitely long data streams operations successive instances are overlapped to achieve a tight throughput. A particular application domain where such characteristics co-exist stringent constraints baseband processing....
Directed graphs are widely used to model data flow and execution dependencies in streaming applications. This enables the utilization of graph partitioning algorithms for problem parallelizing on multiprocessor architectures under hardware resource constraints. However due program memory restrictions embedded systems, applications need be divided into parts without cyclic dependencies. can done by a subsequent second step with an additional acyclicity constraint.
The NimbleAI Horizon Europe project leverages key principles of energy-efficient visual sensing and processing in biological eyes brains, harnesses the latest advances <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{33D}$</tex> stacked silicon integration, to create an integral sensing-processing neuromorphic architecture that efficiently accurately runs computer vision algorithms area-constrained endpoint chips. rationale behind is:...
Manufacturing-viable neuromorphic chips require novel compute architectures to achieve the massively parallel and efficient information processing brain supports so effortlessly. The most promising for that are spiking/event-based, which enables massive parallelism at low complexity. However, large memory requirements synaptic connectivity a showstopper execution of modern convolutional neural networks (CNNs) on parallel, event-based architectures. present work overcomes this roadblock by...
Brain-inspired event-driven processors execute deep neural networks (DNNs) in a sparsity-aware manner. Specifically, if more zeros are induced the activation maps, less computation will be performed succeeding convolution layer. However, inducing sparsity DNNs remains challenge. To address this, we propose training approach STAR (Sparse Thresholded Activation under partial-Regularization), which combines regularization with thresholding, to overcome barrier of single threshold- or...
In Software Defined Radio (SDR), some or all of the physical layer functions are implemented by software. this paper, we focus on channel decoding part SDR. We use Synchronous Data Flow (SDF) and Cyclo-Static (CSDF) graphs to model functions. want tackle problem scheduling a dynamic mix multiple radios with throughput constraints multi-standard multi-channel decoder. The decoder consists Micro-Controller Unit (MCU) several weakly programmable Hardware Units (HU) internal states very limited...
Graphs are widely used to model execution dependencies in applications. In particular, the NP-complete problem of partitioning a graph under constraints receives enormous attention by researchers because its applicability multiprocessor scheduling. We identified additional constraint acyclic between blocks when mapping streaming applications heterogeneous embedded multiprocessor. Existing algorithms and heuristics do not address this requirement deliver results that applicable for our...
Graphs are widely used to model execution dependencies in applications. In particular, the NP-complete problem of partitioning a graph under constraints receives enormous attention by researchers because its applicability multiprocessor scheduling. We identified additional constraint acyclic between blocks when mapping computer vision and imaging applications heterogeneous embedded multiprocessor. Existing algorithms heuristics do not address this requirement deliver results that applicable...