Hai Li

ORCID: 0000-0003-3228-6544
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Memory and Neural Computing
  • Ferroelectric and Negative Capacitance Devices
  • Advanced Neural Network Applications
  • Adversarial Robustness in Machine Learning
  • Parallel Computing and Optimization Techniques
  • Magnetic properties of thin films
  • Semiconductor materials and devices
  • Advanced Data Storage Technologies
  • Neuroscience and Neural Engineering
  • CCD and CMOS Imaging Sensors
  • Neural Networks and Reservoir Computing
  • Low-power high-performance VLSI design
  • Domain Adaptation and Few-Shot Learning
  • Privacy-Preserving Technologies in Data
  • Anomaly Detection Techniques and Applications
  • Quantum and electron transport phenomena
  • Advanced SAR Imaging Techniques
  • Neural Networks and Applications
  • Photoreceptor and optogenetics research
  • Neural dynamics and brain function
  • Physical Unclonable Functions (PUFs) and Hardware Security
  • Topic Modeling
  • Machine Learning and ELM
  • Speech Recognition and Synthesis
  • Machine Learning and Data Classification

Duke University
2013-2025

University of Florida
2025

China Southern Power Grid (China)
2023-2024

Civil Aviation University of China
2012-2024

China Power Engineering Consulting Group (China)
2023-2024

Shanxi University
2021-2024

Power Grid Corporation (India)
2024

Civil Aviation Flight University of China
2019-2024

iQIYI (China)
2020-2023

Southwest Jiaotong University
2019-2023

High demand for computation resources severely hinders deployment of large-scale Deep Neural Networks (DNN) in resource constrained devices. In this work, we propose a Structured Sparsity Learning (SSL) method to regularize the structures (i.e., filters, channels, filter shapes, and layer depth) DNNs. SSL can: (1) learn compact structure from bigger DNN reduce cost; (2) obtain hardware-friendly structured sparsity efficiently accelerate DNNs evaluation. Experimental results show that...

10.48550/arxiv.1608.03665 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Convolution neural networks (CNNs) are the heart of deep learning applications. Recent works PRIME [1] and ISAAC [2] demonstrated promise using resistive random access memory (ReRAM) to perform computations in memory. We found that training cannot be efficiently supported with current schemes. First, they do not consider weight update complex data dependency procedure. Second, attempts increase system throughput a very pipeline. It is only beneficial when large number consecutive images can...

10.1109/hpca.2017.55 article EN 2017-02-01

High network communication cost for synchronizing gradients and parameters is the well-known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary to accelerate deep learning in data parallelism. Our approach requires only three numerical levels {-1,0,1}, which can aggressively reduce time. We mathematically prove convergence under assumption a bound on gradients. Guided by bound, layer-wise ternarizing gradient clipping improve its convergence. experiments...

10.48550/arxiv.1705.07878 preprint EN other-oa arXiv (Cornell University) 2017-01-01

By mimicking the highly parallel biological systems, neuromorphic hardware provides capability of information processing within a compact and energy-efficient platform. However, traditional Von Neumann architecture limited signal connections have severely constrained scalability performance such implementations. Recently, many research efforts been investigated in utilizing latest discovered memristors systems due to similarity synapses. In this paper, we explore potential memristor crossbar...

10.1109/tnnls.2013.2296777 article EN IEEE Transactions on Neural Networks and Learning Systems 2014-01-31

<para xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> Existence of spintronic memristor in nanoscale is demonstrated based upon spin-torque-induced magnetization switching and magnetic-domain-wall motion. Our examples show that memristive effects are quite universal for spin-torque device at the time scale explicitly involves interactions between dynamics electronic charge transport. We also proved can be designed to explore memorize continuum state...

10.1109/led.2008.2012270 article EN IEEE Electron Device Letters 2009-02-13

Graph processing recently received intensive interests in light of a wide range needs to understand relationships. It is well-known for the poor locality and high memory bandwidth requirement. In conventional architectures, they incur significant amount data movements energy consumption which motivates several hardware graph accelerators. The current accelerators rely on access optimizations or placing computation logics close memory. Distinct from all existing approaches, we leverage an...

10.1109/hpca.2018.00052 article EN 2018-02-01

Memristor-based synaptic network has been widely investigated and applied to neuromorphic computing systems for the fast computation low design cost. As memristors continue mature achieve higher density, bit failures within crossbar arrays can become a critical issue. These degrade accuracy significantly. In this work, we propose defect rescuing restore accuracy. our proposed design, significant weights in specified are first identified retraining remapping algorithms described. For two...

10.1145/3061639.3062310 article EN 2017-06-13

Many recent works have shown that deep learning models are vulnerable to quasi-imperceptible input perturbations, yet practitioners cannot fully explain this behavior. This work describes a transfer-based blackbox targeted adversarial attack of feature space representations also provides insights into cross-model class CNNs. The is explicitly designed for transferability and drives representation source image at layer L towards the target L. yields highly transferable examples, which...

10.1109/cvpr.2019.00723 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Federated learning (FL) is a popular distributed framework that can reduce privacy risks by not explicitly sharing private data. However, recent works have demonstrated model updates makes FL vulnerable to inference attack. In this work, we show our key observation the data representation leakage from gradients essential cause of in FL. We also provide an analysis explain how presentation leaked. Based on observation, propose defense called Soteria against inversion attack The idea perturb...

10.1109/cvpr46437.2021.00919 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Spin-transfer torque random access memory (STT-RAM) has received increasing attention because of its attractive features: good scalability, zero standby power, non-volatility and radiation hardness. The use STT-RAM technology in the last level on-chip caches been proposed as it minimizes cache leakage power with scaling down. Furthermore, cell area is only 1/9 ~ 1/3 that SRAM. This allows for a much larger same die footprint, improving overall system performance through reducing misses....

10.1145/2155620.2155659 article EN 2011-12-03

In recent years, non-volatile memory (NVM) technologies have emerged as candidates for future universal memory. NVMs generally advantages such low leakage power, high density, and fast read spead. At the same time, also disadvantages. For example, often asymetric write speed energy cost, which poses new challenges when applying NVMs. This paper contains a collection of four contributions, presenting basic introduction on three emerging NVM technologies, their unique characteristics,...

10.1145/2039370.2039420 article EN 2011-10-09

The Brain-State-in-a-Box (BSB) model is an auto-associative neural network that has been widely used in optical character recognition and image processing. Traditionally, the BSB was realized at software level carried out on high-performance computing clusters. To improve computation efficiency reduce resources requirement, we propose a hardware realization by utilizing memristor crossbar arrays. In this work, explore potential of array as memory. More specificly, recall function...

10.1145/2228360.2228448 article EN 2012-05-31

Recent advances in development of memristor devices and crossbar integration allow us to implement a low-power on-chip neuromorphic computing system (NCS) with small footprint. Training methods have been proposed program the memristors by following existing training algorithms neural network models. However, robustness these has not well investigated taking into account limits imposed realistic hardware implementations. In this work, we present quantitative analysis on impact device...

10.1145/2744769.2744930 article EN 2015-06-02

Poisoning attack is identified as a severe security threat to machine learning algorithms. In many applications, for example, deep neural network (DNN) models collect public data the inputs perform re-training, where input can be poisoned. Although poisoning against support vector machines (SVM) has been extensively studied before, there still very limited knowledge about how such implemented on networks (NN), especially DNNs. this work, we first examine possibility of applying traditional...

10.48550/arxiv.1703.01340 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Object detectors have emerged as an indispensable module in modern computer vision systems. In this work, we propose DPatch -- a black-box adversarial-patch-based attack towards mainstream object (i.e. Faster R-CNN and YOLO). Unlike the original adversarial patch that only manipulates image-level classifier, our simultaneously attacks bounding box regression classification so to disable their predictions. Compared prior works, has several appealing properties: (1) can perform both untargeted...

10.48550/arxiv.1806.02299 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Neuromorphic computing is recently gaining significant attention as a promising candidate to conquer the well-known von Neumann bottleneck. In this work, we propose RENO -- efficient reconfigurable neuromorphic accelerator. leverages extremely mixed-signal computation capability of memristor-based crossbar (MBC) arrays speedup executions artificial neural networks (ANNs). The hierarchically arranged MBC can be configured variety ANN topologies through interconnection network (M-Net)....

10.1145/2744769.2744900 article EN 2015-06-02

In recent years, many systems have employed NAND flash memory as storage devices because of its advantages higher performance (compared to the traditional hard disk drive), high-density, random-access, increasing capacity, and falling cost. On other hand, is limited by "erase-before-write" requirement. Log-based structures been used alleviate this problem writing updated data clean space. Prior log-based methods, however, cannot avoid excessive erase operations when there are frequent...

10.1109/hpca.2010.5416650 article EN 2010-01-01

Neuromorphic systems recently gained increasing attention for their high computation efficiency. Many designs have been proposed and realized with traditional CMOS technology or emerging devices. In this work, we a spiking neuromorphic design built on resistive crossbar structures implemented IBM 130nm technology. Our adopts rate coding scheme where pre- post-neuron signals are represented by digitalized pulses. The weighting function of pre-neuron is executed the in analog format. computing...

10.1145/2744769.2744783 article EN 2015-06-02

Very large-scale Deep Neural Networks (DNNs) have achieved remarkable successes in a large variety of computer vision tasks. However, the high computation intensity DNNs makes it challenging to deploy these models on resource-limited systems. Some studies used low-rank approaches that approximate filters by basis accelerate testing. Those works directly decomposed pre-trained Low-Rank Approximations (LRA). How train toward lower-rank space for more efficient DNNs, however, remains as an open...

10.1109/iccv.2017.78 article EN 2017-10-01

Phenomenally successful in practical inference problems, convolutional neural networks (CNN) are widely deployed mobile devices, data centers, and even supercomputers. The number of parameters needed CNNs, however, often large undesirable. Consequently, various methods have been developed to prune a CNN once it is trained. Nevertheless, the resulting CNNs offer limited benefits. While pruning fully connected layers reduces CNN's size considerably, does not improve speed noticeably as compute...

10.48550/arxiv.1608.01409 preprint EN other-oa arXiv (Cornell University) 2016-01-01

In this paper, we present a survey of recent works in developing neuromorphic or neuro-inspired hardware systems. particular, focus on those systems which can either learn from data an unsupervised online supervised manner. We algorithms and architectures developed specially to support on-chip learning. Emphasis is placed friendly modifications standard algorithms, such as backpropagation, well novel structural plasticity, for low-resolution synapses. cover related both spike-based more...

10.1109/jetcas.2018.2816339 article EN IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2018-03-01
Coming Soon ...