- Radiation Effects in Electronics
- Low-power high-performance VLSI design
- VLSI and Analog Circuit Testing
- Distributed systems and fault tolerance
- Parallel Computing and Optimization Techniques
- Semiconductor materials and devices
- Integrated Circuits and Semiconductor Failure Analysis
- Software Reliability and Analysis Research
- Embedded Systems Design Techniques
- Advancements in Semiconductor Devices and Circuit Design
- Advanced Memory and Neural Computing
- Numerical Methods and Algorithms
- Reliability and Maintenance Optimization
- Real-Time Systems Scheduling
- Quantum many-body systems
- Interconnection Networks and Systems
- Silicon and Solar Cell Technologies
- Ferroelectric and Negative Capacitance Devices
- Engineering and Test Systems
- Mitochondrial Function and Pathology
Universidade Federal do Rio Grande do Sul
2015-2022
This paper presents a dual-core lockstep (DCLS) implementation to protect hard-core processors against radiation-induced soft errors. The proposed DCLS is applied an Advanced RISC Machine Cortex-A9 embedded processor. Different software optimizations were evaluated assess their impact on performance and fault tolerance. Heavy ions' experiments injection emulation performed analyze the system susceptibility errors performance. Results show that approach able decrease cross section achieve...
This work is a survey on approximate computing and its impact fault tolerance, especially for safety-critical applications. It presents multitude of approximation methodologies, which are typically applied at software, architecture, circuit level. Those methodologies discussed compared all their possible levels implementations (some techniques more than one level). Approximation also presented as means to provide tolerance high reliability: Traditional error masking techniques, such triple...
The increased need for computing capabilities and higher efficiency have stimulated industries to make available in the market novel architectures with complexity. variety of codes that be executed combined complexity introduces challenges reliability evaluation systems applications. This paper compares behaviors six different (an Intel co-processor, three NVIDIA GPUs, an AMD APU, embedded ARM) executing eight codes. To support our evaluation, we present discuss experimental beam data covers...
Software-based techniques offer several advantages to increase the reliability of processor-based systems at very low cost, but they cause performance degradation and an code size. To meet constraints in memory, we propose SETA, a new control-flow software-only technique that uses assertions detect errors affecting program flow. SETA is independent technique, it was conceived work together with previously proposed data-flow aim reducing memory overheads. Thus, combined such submitted fault...
ARM processors are leaders in embedded systems, delivering high-performance computing, power efficiency, and reduced cost. For this reason, there is a relevant interest for its use the aerospace industry. However, of sub-micron technologies has increased sensitivity to radiation-induced transient faults. Thus, mitigation soft errors become major concern. Software-Implemented Hardware Fault Tolerance (SIHFT) techniques low-cost way protect against errors. On other hand, they cause high...
This paper presents an analysis of the efficiency traditional fault-tolerance methods on parallel systems running top Linux OS. It starts by studying occurrence software errors at presenting different levels complexity, from sequential bare metal to applications. Then two mechanisms (triple modular redundancy and duplication with comparison variant) are applied applications their analyzed. All cases were tested single dual-core versions ARM Cortex-A9 processor that is embedded in many...
This paper evaluates the efficiency and performance impact of a dual-core lockstep as method for fault-tolerance running on top FreeRTOS applications. The was implemented ARM Cortez-A9 processor embedded into Zynq-7000 APSoC. Fault injection experiments show that can mitigate up to 63% result is very near mitigation achievable bare-metal. Results also overhead caused by higher application than it
This paper presents an analysis of the occurrence software errors at parallel applications using POSIX Threads (Pthreads) versus their OpenMP counterparts and sequential versions. All cases were tested ARM Cortex-A9 dual-core processor that is embedded in many commercial SoC available market. The OVP simulator platform used to instantiate model a fault injection injects faults system during simulation. We analyze effect bit-flips registers both cores execution versions same applications....
This work presents a comparative analysis of successive approximation algorithms with ordinary under fault injection. Successive are implemented as computation loop that approaches the final result for each execution. Because that, we expect they will present natural tolerance, given in point execution may be masked next loops. evaluate their behavior simulated injection, and executing bare metal on top FreeRTOS Linux operating systems. Results show less silent data corruption errors than...
This work investigates how the approximate computing paradigm can be exploited to provide low-cost fault tolerant architectures. In particular, we focus on implementation of Approximate Triple Modular Redundancy (ATMR) designs using precision reduction technique. The proposed method is applied two benchmarks and a multitude ATMR with different degrees approximation. are implemented Xilinx Zynq-7000 APSoC FPGA through high-level synthesis evaluated concerning area usage inaccuracy caused by...
Today's computer architectures and semiconductor technologies are facing major challenges making them incapable to deliver the required features (such as efficiency) for emerging applications. Alternative being under investigation in order continue sustainable benefits foreseeable future society at affordable cost. These not only changing traditional computing paradigm (e.g., terms of programming models, compilers, circuit design), but also setting up new directions on way these should be...
This work explores the fault tolerance of successive approximation algorithms, which are based on loop computations that approximate to a final result each iteration. type computing algorithms can present an inherent as they manage small discrepancies in data values and converge correct after certain number interactions. A set were implemented embedded software ARM Cortex A9 processor Xilinx Zynq-7000 series board. Experiments consist exposing finned laser beams at frequency 10Hz. The beam...
Approximate Computing (AxC) paradigm aims at designing energy-efficient systems, saving computational resources, and presenting better execution times. AxC to selectively violate the specifications, trading accuracy off for efficiency. It has been demonstrated in literature effectiveness of imprecise computation both software hardware components implementing inexact algorithms, showing an inherent resiliency errors. On other hand, hidden cost is reduction on errors application. This paper...
This work presents an analysis of the occurrence software errors at ARM Cortex-A9 dual-core processor. Fault injection results compare error rate and their causes. Results show that different parallelization algorithms can have rates, there is a tendency on parallel applications to more silent data corruption than sequential counterparts.
A set of software-based techniques to detect soft errors in embedded ARM processors at low costs is presented. Fault injection results show high fault coverage performance and memory overheads inferior state-of-the-art techniques.
This paper presents a novel redundancy technique for software fault tolerance, named Approximative Redundant Fault Tolerance (ARFT). It uses approximate computing in order to provide the same error detection of classic DWC method with less overhead. In this work ARFT was implemented protect ARM Cortex-A9 embedded into Zynq-7000 All Programmable SoC. An extensive injection campaign performed evaluate proposed technique. Results show that distinct applications different approximation methods...
The computing continuum's actual trend is facing a growth in terms of devices with any degree computational capability. Those may or not include full-stack, including the Operating System layer and Application layer, just pure bare-metal solutions. In either case, reliability full system stack has to be guaranteed. It crucial provide data regarding impact faults at all levels potential hardening solutions design highly resilient systems. While most work usually concentrates on application...
This work proposes the use of Triple Modular Redundancy (TMR) to mitigate multiple bit upsets affecting embedded software. To reduce TMR's overhead, we propose exploit approximate computing paradigm and, more in particular, so-called successive approximations technique. The proposed approach, called Approximate TMR (ATMR), leverages nature approximation technique improve fault masking. Successive algorithms are based on loop computations that a final result each iteration. By varying number...
Today's computer architectures and semiconductor technologies are facing major challenges making them incapable to deliver the required features (such as efficiency) for emerging applications. Alternative being under investigation in order continue sustainable benefits foreseeable future society at affordable cost. These not only changing traditional computing paradigm (e.g., terms of programming models, compilers, circuit design), but also setting up new opportunities concerning test...
This work investigates the influence of using built-in configuration memory scrubber and triple modular hardware redundancy in cross section a radiation-hardened SRAM-based FPGA from NanoXplore. Different designs versions are investigated under heavy ions for occurrence transient errors, failures, timeouts. The calculated dynamic cross-sections agreement with expected order magnitude radiation hardened FPGAs. Results show that most reliable is DSPs operational logic applying full design...
This paper investigates the impact of parallelization and redundancy at thread level in four TMR software implementations presents a novel use for on applications running top Linux OS. In this work, is used not only as protection against silent data corruptions but also hangs. done by discarding threads that presented hangs or terminated unexpectedly. The application tolerance to soft errors evaluated employing fault injection module extension based OVP simulator (OVPsim-FIM). Results show...