- Radiation Effects in Electronics
- Parallel Computing and Optimization Techniques
- Security and Verification in Computing
- Advanced Neural Network Applications
- Adversarial Robustness in Machine Learning
- Distributed systems and fault tolerance
- Advanced Data Storage Technologies
- Software Reliability and Analysis Research
- Software Testing and Debugging Techniques
- Domain Adaptation and Few-Shot Learning
- Distributed and Parallel Computing Systems
- Algorithms and Data Compression
- VLSI and Analog Circuit Testing
- Brain Tumor Detection and Classification
- Smart Grid Security and Resilience
- Generative Adversarial Networks and Image Synthesis
- 2D Materials and Applications
- Thermodynamic and Exergetic Analyses of Power and Cooling Systems
- Reproductive Biology and Fertility
- Advanced Thermoelectric Materials and Devices
- Advanced Thermodynamic Systems and Engines
- Cloud Data Security Solutions
- Autonomous Vehicle Technology and Safety
- Integrated Circuits and Semiconductor Failure Analysis
- Low-power high-performance VLSI design
University of Iowa
2020-2025
Argonne National Laboratory
2024
Southern University of Science and Technology
2023-2024
National Supercomputing Center in Shenzhen
2024
Hainan University
2023
Shandong Electric Power Engineering Consulting Institute Corp
2021-2023
China Power Engineering Consulting Group (China)
2021-2023
Zhengzhou University
2022
University of British Columbia
2014-2020
Chinese Academy of Sciences
2015-2017
Deep learning neural networks (DNNs) have been successful in solving a wide range of machine problems. Specialized hardware accelerators proposed to accelerate the execution DNN algorithms for high-performance and energy efficiency. Recently, they deployed datacenters (potentially business-critical or industrial applications) safety-critical systems such as self-driving cars. Soft errors caused by high-energy particles increasing systems, these can lead catastrophic failures systems....
Hardware errors are on the rise with reducing feature sizes, however tolerating them in hardware is expensive. Researchers have explored software-based techniques for building error resilient applications. Many of these leverage application-specific resilience characteristics to keep overheads low. Understanding requires software fault-injection mechanisms that both accurate and capable operating at a high-level abstraction allow developers reason about resilience. In this paper, we quantify...
The 2H (MoS2-type) phase of 2D transition metal dichalcogenides (TMDCs) has been extensively studied and exhibits excellent electronic optoelectronic properties, but the high phonon thermal conductivity is detrimental to thermoelectric performances. Here, we use first-principles methods combined with Boltzmann transport theory calculate phononic properties 1T (CdI2-type) SnSe2 monolayer, a recently realized dichalcogenide semiconductor. calculated band gap 0.85 eV, which little larger than...
As machine learning (ML) becomes pervasive in high performance computing, ML has found its way into safety-critical domains (e.g., autonomous vehicles). Thus the reliability of grown importance. Specifically, failures systems can have catastrophic consequences, and occur due to soft errors, which are increasing frequency system scaling. Therefore, we need evaluate presence errors.
The adoption of deep neural networks (DNNs) in safety-critical domains has engendered serious reliability concerns. A prominent example is hardware transient faults that are growing frequency due to the progressive technology scaling, and can lead failures DNNs. This work proposes Ranger, a low-cost fault corrector, which directly rectifies faulty output without re-computation. DNNs inherently resilient benign (which will not cause corruption), but critical result erroneous output). Ranger...
GPUs have emerged as general-purpose accelerators in high-performance computing (HPC) and scientific applications. However, the reliability characteristics of GPU applications not been investigated depth. While error propagation has extensively for non-GPU applications, a very different programming model which can significant effect on them. We perform an empirical study to understand characterize build compilerbased fault-injection tool track propagation, define metrics find exhibit some...
As technology scales to lower feature sizes, devices become more susceptible soft errors. Soft errors can lead silent data corruptions (SDCs), seriously compromising the reliability of a system. Traditional hardware-only techniques avoid SDCs are energy hungry, and hence not suitable for commodity systems. Researchers have proposed selective software-based protection tolerate hardware faults at costs. However, these either use expensive fault injection or inaccurate analytical models...
Using first-principle calculations combined with Boltzmann transport theory, we investigate the biaxial strain effect on electronic and phonon thermal properties of a 1 T (CdI2-type) structural TiS2 monolayer, recent experimental two-dimensional (2D) material. It is found that band structure can be effectively modulated gap experiences an indirect−direct−indirect transition increasing tensile strain. The convergence induced by increases Seebeck coefficient power factor, while lattice...
As machine learning (ML) has seen increasing adoption in safety-critical domains (e.g., autonomous vehicles), the reliability of ML systems also grown importance. While prior studies have proposed techniques to enable efficient error-resilience selective instruction duplication), a fundamental requirement for realizing these is detailed understanding application's resilience. In this work, we present TensorFI, high-level fault injection (FI) framework TensorFlow-based applications. TensorFI...
Machine Learning (ML) applications have emerged as the killer for next generation hardware and software platforms, there is a lot of interest in frameworks to build such applications. TensorFlow high-level dataflow framework building ML has become most popular one recent past. are also being increasingly used safety-critical systems self-driving cars home robotics. Therefore, compelling need evaluate resilience built using TensorFlow. In this paper, we fault injection called TensorFI...
GPUs have emerged as general-purpose accelerators in high-performance computing (HPC) and scientific applications. However, the reliability characteristics of GPU applications not been investigated depth. While error propagation has extensively for non-GPU applications, a very different programming model which can significant effect on them. We perform an empirical study to understand characterize build compiler-based fault-injection tool track propagation, define metrics find exhibit some...
Recently, a new two-dimensional (2D) semiconductor SnSe<sub>2</sub> monolayer has been grown by molecular beam epitaxy, and weak ferromagnetic behavior above room temperature in Mn-doped thin films was also observed experimentally.
Transient hardware faults are increasing in computer systems due to shrinking feature sizes. Traditional methods mitigate such through duplication, which incurs huge overhead performance and energy consumption. Therefore, researchers have explored software solutions as selective instruction require fine-grained analysis of vulnerabilities Silent Data Corruptions (SDCs). These typically evaluated via Fault Injection (FI), is often highly time-consuming. Hence, most studies confine their...
Robotic Vehicles (RV) rely extensively on sensor inputs to operate autonomously. Physical attacks such as tampering and spoofing can feed erroneous measurements deviate RVs from their course result in mission failures. In this paper, we present PID-Piper, a novel framework for automatically recovering physical attacks. We use machine learning (ML) design an attack resilient Feed-Forward Controller (FFC), which runs tandem with the RV's primary controller monitors it. Under attacks, FFC takes...
Modern scientific applications and supercomputing systems are generating large amounts of data in various fields, leading to critical challenges storage footprints communication times. To address this issue, error-bounded GPU lossy compression has been widely adopted, since it can reduce the volume within a customized threshold on distortion. In work, we propose an ultra-fast compressor cuSZp. Specifically, cuSZp computes linear recurrences with hierarchical parallelism fuse massive...
As machine learning (ML) has seen increasing adoption in safety-critical domains (e.g., autonomous vehicles), the reliability of ML systems also grown importance. While prior studies have proposed techniques to enable efficient error-resilience selective instruction duplication), a fundamental requirement for realizing these is detailed understanding application's resilience. In this work, we present TensorFI 1 and 2, high-level fault injection (FI) frameworks TensorFlow-based applications....
Today's scientific applications and advanced instruments are producing extremely large volumes of data everyday, so that error-controlled lossy compression has become a critical technique to the storage management. Existing compressors, however, designed mainly based on error-control driven mechanism, which cannot be efficiently applied in fixed-ratio use-case, where desired ratio needs reached because restricted processing/management resources such as limited memory/storage capacity network...
OBJECTIVES: The relationship between long working hours and the risk of mortality has been debated in various countries. This study aimed to investigate association all-cause a large population-based cohort China. METHODS: retrospective (N=10 269) used large, nationally representative data set [the China Health Nutrition Surveys (CHNS)] from 1989 2015. Long (≥55 per week) were compared standard (35–40 week). outcome measure was mortality. Hazard ratio (HR) for calculated Cox proportional...
Summary Safety‐critical applications, such as autonomous vehicles, healthcare, and space have witnessed widespread deployment of deep neural networks (DNNs). Inherent algorithmic inaccuracies consistently been a prevalent cause misclassifications, even in modern DNNs. Simultaneously, with an ongoing effort to minimize the footprint contemporary chip design, there is continual rise likelihood transient hardware faults deployed DNN models. Consequently, researchers wondered extent which these...
Error-bounded lossy compression has been effective in significantly reducing the data storage/transfer burden while preserving reconstructed fidelity very well. Many error-bounded compressors have developed for a wide range of parallel and distributed use cases years. These are designed with distinct models design principles, such that each them features particular pros cons. In this paper we provide comprehensive survey emerging techniques different involving big to process. The key...
As the rate of transient hardware faults increases, researchers have investigated software techniques to tolerate these faults. An important class are those that cause long- latency crashes (LLCs), or can persist for a long time in program before causing it crash. In this paper, we develop technique automatically find locations where LLC originate so be protected bound program's crash latency. We first identify code patterns responsible majority through an empirical study. then build...