- Parallel Computing and Optimization Techniques
- Cloud Computing and Resource Management
- Advanced Data Storage Technologies
- Distributed and Parallel Computing Systems
- Radiation Effects in Electronics
- Low-power high-performance VLSI design
- Distributed systems and fault tolerance
- Software System Performance and Reliability
- Software Testing and Debugging Techniques
- Embedded Systems Design Techniques
- Software Engineering Research
- Security and Verification in Computing
- Ferroelectric and Negative Capacitance Devices
- Semiconductor materials and devices
- Scientific Computing and Data Management
- Advanced Memory and Neural Computing
- Advancements in Semiconductor Devices and Circuit Design
- Interconnection Networks and Systems
- Adversarial Robustness in Machine Learning
- Numerical Methods and Algorithms
- Machine Learning and Data Classification
- Quantum Computing Algorithms and Architecture
- Evolutionary Algorithms and Applications
- Skin Diseases and Diabetes
- Stochastic Gradient Optimization Techniques
Lawrence Livermore National Laboratory
2020-2025
University of Illinois Urbana-Champaign
2024
University of Thessaly
2014-2023
University of Maryland, College Park
2023
University of California, Davis
2023
James Madison University
2023
University of Crete
2019-2020
Chalmers University of Technology
2020
Universitat Politècnica de Catalunya
2020
Barcelona Supercomputing Center
2020
Dependable computing on unreliable substrates is the next challenge community needs to overcome due both manufacturing limitations in low geometries and necessity aggressively minimize power consumption. System designers often need analyze way hardware faults manifest as errors at architectural level how these affect application correctness. This paper introduces GemFI, a fault injection tool based cycle accurate full system simulator Gem5. GemFI provides methods easily extensible support...
The objective of this work is to develop a methodology and associated platform for nucleic acid detection at the point-of-care (POC) that sensitive, user-friendly, affordable, rapid, robust. heart system an acoustic wave sensor, based on Surface Acoustic Wave (SAW) or Quartz Crystal Microbalance (QCM) device, which employed label-free isothermally amplified target DNA. Nucleic acids amplification demonstrated inside three crude human samples, i.e., whole blood, saliva, nasal swab, spiked in...
Several applications may trade-off output quality for energy efficiency by computing only an approximation of their output. Current approaches to software-based approximate often require the programmer specify parts code or data structures that can be approximated. A largely unaddressed challenge is how automate analysis significance quality. To this end, we propose a methodology and toolset automatic analysis. We use interval arithmetic algorithmic differentiation in our profile-driven yet...
This paper revisits the simple, long-studied, yet still unsolved problem of making image classifiers robust to imperceptible perturbations. Taking CIFAR10 as an example, SOTA clean accuracy is about $100$%, but robustness $\ell_{\infty}$-norm bounded perturbations barely exceeds $70$%. To understand this gap, we analyze how model size, dataset and synthetic data quality affect by developing first scaling laws for adversarial training. Our reveal inefficiencies in prior art provide actionable...
We introduce a task-based programming model and runtime system that exploit the observation not all parts of program are equally significant for accuracy end-result, in order to trade off quality outputs increased energy-efficiency. This is done structured flexible way, allowing easy exploitation different points quality/energy space, without adversely affecting application performance. The can apply number policies decide whether it will execute less-significant tasks accurately or...
The aim of this work is to employ 3D printing for the fabrication bioreactors DNA amplification with long term use them healthcare applications. Initially, most suitable material reactor was evaluated by testing 25 different commercially available filaments, 3D-printed through fused filament (FFF) method. Evaluation carried out against materials efficiency, compatibility enzymatic assay and transparency. best-performing transparent further used a cartridge eight micro-well reactors; latter...
High performance computing (HPC) systems pervasively feature GPU accelerators. For maximum efficiency, these are usually programmed using vendor-specific languages, such as CUDA. However, this is not portable and leads to vendor lock-in. Existing proramming models require transcribing the whole application, which tedious often results in sub-optimal without necessarily avoiding need maintain multiple versions. Although solutions for automated translation exist, they sacrifice either features...
As we approach the limits of Moore's law, researchers are exploring new paradigms for future high-performance computing (HPC) systems. Approximate has gained traction by promising to deliver substantial power. However, due stringent accuracy requirements HPC scientific applications, broad adoption approximate methods in requires an in-depth understanding application's amenability approximations.
HPC is a heterogeneous world in which host and device code are interleaved throughout the application. Given significant performance advantage of accelerators, execution time becoming new bottleneck. Tuning accelerated parts consequently highly desirable but often impractical due to large overall application runtime includes unrelated parts.
We introduce a task-based programming model and runtime system that exploit the observation not all parts of program are equally significant for accuracy end-result, in order to trade off quality outputs increased energy-efficiency. This is done structured flexible way, allowing easy exploitation different points quality/energy space, without adversely affecting application performance. The can apply number policies decide whether it will execute less-significant tasks accurately or...
To improve power efficiency, researchers are experimenting with dynamically adjusting the voltage and frequency margins of systems to just above minimum required for reliable operation. Traditionally, manufacturers did not allow reducing these margins. Consequently, existing studies use system simulators, or software fault-injection methodologies, which slow, inaccurate cannot be applied on realistic workloads. However recent CPUs operation outside nominal voltage/frequency envelope. We...
As we approach the era of exa-scale computing, fault tolerance is growing importance. The increasing number cores as well increased complexity modern heterogenous systems result in substantial decrease expected mean time between failures. Among different techniques, checkpoint/restart vastly adopted supercomputing systems. Although many supercomputers TOP 500 list use GPUs, only a few checkpoint restart mechanism support GPUs.In this paper, extend an application level library, called...
With the increasing interest in applying approximate computing to HPC applications, representative benchmarks are needed evaluate and compare various algorithms programming frameworks. To this end, we propose HPC-MixPBench, a benchmark suite consisting of set kernels that widely used domain. HPC-MixPBench has test harness framework where different tools can be plugged evaluated on benchmarks. We demonstrate effectiveness our by evaluating several mixed-precision implemented FloatSmith, tool...
This article introduces a significance-centric programming model and runtime support that sets the supply voltage in multicore CPU to sub-nominal values reduce energy footprint provide mechanisms control output quality. The developers specify significance of application tasks respecting their contribution quality check repair functions for handling faults. On system, we evaluate five benchmarks using an quantifies reduction. When executing least-significant unreliably, our approach leads 20%...
The LEGaTO project leverages task-based programming models to provide a software ecosystem for Made in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. aim is attain one order magnitude energy savings from the edge converged cloud/HPC, balanced with security resilience challenges. an ongoing three-year EU H2020 started in December 2017.
Chip manufacturers introduce redundancy at various levels of CPU design to guarantee correct operation, even for worst-case combinations non-idealities in process variation and system operating conditions. This is implemented partly the form voltage margins. However, a wide range real-world execution scenarios these margins are excessive merely translate increased power consumption, hindering effort towards higher-energy efficiency both HPC general purpose computing. Our study on x86-64...
Approximate execution is a viable technique for energy-constrained environments, provided that applications have the mechanisms to produce outputs of highest possible quality within given energy budget.
Currently, offloading to accelerators requires users identify which regions are be executed on the device, what memory needs transferred, and how synchronization is resolved. On top of these manual tasks, many standard (C/ C++ library) functions, such as file I/O or manipulation, cannot directly device need worked around by user explicitly. This makes it challenging port programs in first place hinders developers from testing features GPU within compilation pipeline. Existing tests test...
Heterogeneity has become a mainstream architecture design choice for building High Performance Computing systems. However, heterogeneity poses significant challenges achieving performance portability of execution. Adapting program to new heterogeneous platform is laborious and requires developers manually explore vast space execution parameters. To address those challenges, this paper proposes extensions OpenMP autonomous, machine learning-driven adaptation. Our solution includes set novel...
Zeroth-order (ZO) optimization has become a popular technique for solving machine learning (ML) problems when first-order (FO) information is difficult or impossible to obtain. However, the scalability of ZO remains an open problem: Its use primarily been limited relatively small-scale ML problems, such as sample-wise adversarial attack generation. To our best knowledge, no prior work demonstrated effectiveness in training deep neural networks (DNNs) without significant decrease performance....
MPI has been ubiquitously deployed in flagship HPC systems aiming to accelerate distributed scientific applications running on tens of hundreds processes and compute nodes. Maintaining the correctness integrity application execution is critical, especially for safety-critical applications. Therefore, a collection effective fault tolerance techniques have proposed enable efficiently resume from system failures. However, there no structured way study compare different designs, so guide...
Compilers use a wide range of advanced optimizations to improve the quality machine code they generate. In most cases, compiler rely on precise analyses be able perform optimizations. However, whenever control-flow merge is performed information lost as it not possible precisely reason about program anymore. One existing solution this issue duplication, which involves duplicating instructions from blocks their predecessors. This paper introduces novel and more aggressive approach grounded in...