Guanpeng Li

ORCID: 0000-0001-7773-7826
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Radiation Effects in Electronics
  • Parallel Computing and Optimization Techniques
  • Security and Verification in Computing
  • Advanced Neural Network Applications
  • Adversarial Robustness in Machine Learning
  • Distributed systems and fault tolerance
  • Advanced Data Storage Technologies
  • Software Reliability and Analysis Research
  • Software Testing and Debugging Techniques
  • Domain Adaptation and Few-Shot Learning
  • Distributed and Parallel Computing Systems
  • Algorithms and Data Compression
  • VLSI and Analog Circuit Testing
  • Brain Tumor Detection and Classification
  • Smart Grid Security and Resilience
  • Generative Adversarial Networks and Image Synthesis
  • 2D Materials and Applications
  • Thermodynamic and Exergetic Analyses of Power and Cooling Systems
  • Reproductive Biology and Fertility
  • Advanced Thermoelectric Materials and Devices
  • Advanced Thermodynamic Systems and Engines
  • Cloud Data Security Solutions
  • Autonomous Vehicle Technology and Safety
  • Integrated Circuits and Semiconductor Failure Analysis
  • Low-power high-performance VLSI design

University of Iowa
2020-2025

Argonne National Laboratory
2024

Southern University of Science and Technology
2023-2024

National Supercomputing Center in Shenzhen
2024

Hainan University
2023

Shandong Electric Power Engineering Consulting Institute Corp
2021-2023

China Power Engineering Consulting Group (China)
2021-2023

Zhengzhou University
2022

University of British Columbia
2014-2020

Chinese Academy of Sciences
2015-2017

Deep learning neural networks (DNNs) have been successful in solving a wide range of machine problems. Specialized hardware accelerators proposed to accelerate the execution DNN algorithms for high-performance and energy efficiency. Recently, they deployed datacenters (potentially business-critical or industrial applications) safety-critical systems such as self-driving cars. Soft errors caused by high-energy particles increasing systems, these can lead catastrophic failures systems....

10.1145/3126908.3126964 article EN 2017-11-08

Hardware errors are on the rise with reducing feature sizes, however tolerating them in hardware is expensive. Researchers have explored software-based techniques for building error resilient applications. Many of these leverage application-specific resilience characteristics to keep overheads low. Understanding requires software fault-injection mechanisms that both accurate and capable operating at a high-level abstraction allow developers reason about resilience. In this paper, we quantify...

10.1109/dsn.2014.2 article EN 2014-06-01

The 2H (MoS2-type) phase of 2D transition metal dichalcogenides (TMDCs) has been extensively studied and exhibits excellent electronic optoelectronic properties, but the high phonon thermal conductivity is detrimental to thermoelectric performances. Here, we use first-principles methods combined with Boltzmann transport theory calculate phononic properties 1T (CdI2-type) SnSe2 monolayer, a recently realized dichalcogenide semiconductor. calculated band gap 0.85 eV, which little larger than...

10.1088/0953-8984/29/1/015001 article EN Journal of Physics Condensed Matter 2016-11-10

As machine learning (ML) becomes pervasive in high performance computing, ML has found its way into safety-critical domains (e.g., autonomous vehicles). Thus the reliability of grown importance. Specifically, failures systems can have catastrophic consequences, and occur due to soft errors, which are increasing frequency system scaling. Therefore, we need evaluate presence errors.

10.1145/3295500.3356177 article EN 2019-11-07

The adoption of deep neural networks (DNNs) in safety-critical domains has engendered serious reliability concerns. A prominent example is hardware transient faults that are growing frequency due to the progressive technology scaling, and can lead failures DNNs. This work proposes Ranger, a low-cost fault corrector, which directly rectifies faulty output without re-computation. DNNs inherently resilient benign (which will not cause corruption), but critical result erroneous output). Ranger...

10.1109/dsn48987.2021.00018 article EN 2021-06-01

GPUs have emerged as general-purpose accelerators in high-performance computing (HPC) and scientific applications. However, the reliability characteristics of GPU applications not been investigated depth. While error propagation has extensively for non-GPU applications, a very different programming model which can significant effect on them. We perform an empirical study to understand characterize build compilerbased fault-injection tool track propagation, define metrics find exhibit some...

10.1109/sc.2016.20 article EN 2016-11-01

As technology scales to lower feature sizes, devices become more susceptible soft errors. Soft errors can lead silent data corruptions (SDCs), seriously compromising the reliability of a system. Traditional hardware-only techniques avoid SDCs are energy hungry, and hence not suitable for commodity systems. Researchers have proposed selective software-based protection tolerate hardware faults at costs. However, these either use expensive fault injection or inaccurate analytical models...

10.1109/dsn.2018.00016 article EN 2018-06-01

Using first-principle calculations combined with Boltzmann transport theory, we investigate the biaxial strain effect on electronic and phonon thermal properties of a 1 T (CdI2-type) structural TiS2 monolayer, recent experimental two-dimensional (2D) material. It is found that band structure can be effectively modulated gap experiences an indirect−direct−indirect transition increasing tensile strain. The convergence induced by increases Seebeck coefficient power factor, while lattice...

10.1088/1361-6528/aa99ba article EN Nanotechnology 2017-11-10

As machine learning (ML) has seen increasing adoption in safety-critical domains (e.g., autonomous vehicles), the reliability of ML systems also grown importance. While prior studies have proposed techniques to enable efficient error-resilience selective instruction duplication), a fundamental requirement for realizing these is detailed understanding application's resilience. In this work, we present TensorFI, high-level fault injection (FI) framework TensorFlow-based applications. TensorFI...

10.1109/issre5003.2020.00047 article EN 2020-10-01

Machine Learning (ML) applications have emerged as the killer for next generation hardware and software platforms, there is a lot of interest in frameworks to build such applications. TensorFlow high-level dataflow framework building ML has become most popular one recent past. are also being increasingly used safety-critical systems self-driving cars home robotics. Therefore, compelling need evaluate resilience built using TensorFlow. In this paper, we fault injection called TensorFI...

10.1109/issrew.2018.00024 article EN 2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW) 2018-10-01

GPUs have emerged as general-purpose accelerators in high-performance computing (HPC) and scientific applications. However, the reliability characteristics of GPU applications not been investigated depth. While error propagation has extensively for non-GPU applications, a very different programming model which can significant effect on them. We perform an empirical study to understand characterize build compiler-based fault-injection tool track propagation, define metrics find exhibit some...

10.5555/3014904.3014932 article EN IEEE International Conference on High Performance Computing, Data, and Analytics 2016-11-13

Recently, a new two-dimensional (2D) semiconductor SnSe<sub>2</sub> monolayer has been grown by molecular beam epitaxy, and weak ferromagnetic behavior above room temperature in Mn-doped thin films was also observed experimentally.

10.1039/c7ra07648g article EN cc-by-nc RSC Advances 2017-01-01

Transient hardware faults are increasing in computer systems due to shrinking feature sizes. Traditional methods mitigate such through duplication, which incurs huge overhead performance and energy consumption. Therefore, researchers have explored software solutions as selective instruction require fine-grained analysis of vulnerabilities Silent Data Corruptions (SDCs). These typically evaluated via Fault Injection (FI), is often highly time-consuming. Hence, most studies confine their...

10.1109/dsn.2018.00038 article EN 2018-06-01

Robotic Vehicles (RV) rely extensively on sensor inputs to operate autonomously. Physical attacks such as tampering and spoofing can feed erroneous measurements deviate RVs from their course result in mission failures. In this paper, we present PID-Piper, a novel framework for automatically recovering physical attacks. We use machine learning (ML) design an attack resilient Feed-Forward Controller (FFC), which runs tandem with the RV's primary controller monitors it. Under attacks, FFC takes...

10.1109/dsn48987.2021.00020 article EN 2021-06-01

Modern scientific applications and supercomputing systems are generating large amounts of data in various fields, leading to critical challenges storage footprints communication times. To address this issue, error-bounded GPU lossy compression has been widely adopted, since it can reduce the volume within a customized threshold on distortion. In work, we propose an ultra-fast compressor cuSZp. Specifically, cuSZp computes linear recurrences with hierarchical parallelism fuse massive...

10.1145/3581784.3607048 article EN 2023-10-30

As machine learning (ML) has seen increasing adoption in safety-critical domains (e.g., autonomous vehicles), the reliability of ML systems also grown importance. While prior studies have proposed techniques to enable efficient error-resilience selective instruction duplication), a fundamental requirement for realizing these is detailed understanding application's resilience. In this work, we present TensorFI 1 and 2, high-level fault injection (FI) frameworks TensorFlow-based applications....

10.1109/tdsc.2022.3175930 article EN IEEE Transactions on Dependable and Secure Computing 2022-07-18

Today's scientific applications and advanced instruments are producing extremely large volumes of data everyday, so that error-controlled lossy compression has become a critical technique to the storage management. Existing compressors, however, designed mainly based on error-control driven mechanism, which cannot be efficiently applied in fixed-ratio use-case, where desired ratio needs reached because restricted processing/management resources such as limited memory/storage capacity network...

10.1109/icde55515.2023.00116 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2023-04-01

OBJECTIVES: The relationship between long working hours and the risk of mortality has been debated in various countries. This study aimed to investigate association all-cause a large population-based cohort China. METHODS: retrospective (N=10 269) used large, nationally representative data set [the China Health Nutrition Surveys (CHNS)] from 1989 2015. Long (≥55 per week) were compared standard (35–40 week). outcome measure was mortality. Hazard ratio (HR) for calculated Cox proportional...

10.5271/sjweh.4115 article EN cc-by Scandinavian Journal of Work Environment & Health 2023-09-04

Summary Safety‐critical applications, such as autonomous vehicles, healthcare, and space have witnessed widespread deployment of deep neural networks (DNNs). Inherent algorithmic inaccuracies consistently been a prevalent cause misclassifications, even in modern DNNs. Simultaneously, with an ongoing effort to minimize the footprint contemporary chip design, there is continual rise likelihood transient hardware faults deployed DNN models. Consequently, researchers wondered extent which these...

10.1002/stvr.1873 article EN cc-by Software Testing Verification and Reliability 2024-02-01

Error-bounded lossy compression has been effective in significantly reducing the data storage/transfer burden while preserving reconstructed fidelity very well. Many error-bounded compressors have developed for a wide range of parallel and distributed use cases years. These are designed with distinct models design principles, such that each them features particular pros cons. In this paper we provide comprehensive survey emerging techniques different involving big to process. The key...

10.48550/arxiv.2404.02840 preprint EN arXiv (Cornell University) 2024-04-03

10.1109/ipdps57955.2024.00052 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2024-05-27

As the rate of transient hardware faults increases, researchers have investigated software techniques to tolerate these faults. An important class are those that cause long- latency crashes (LLCs), or can persist for a long time in program before causing it crash. In this paper, we develop technique automatically find locations where LLC originate so be protected bound program's crash latency. We first identify code patterns responsible majority through an empirical study. then build...

10.1109/dsn.2015.36 article EN 2015-06-01
Coming Soon ...