NFDI4DS | UHH-SEMS - Publication Details

Design Flow of Accelerating Hybrid Extremely Low Bit-Width Neural Network in Embedded FPGA

OPENALEX - Publications

Junsong Wang Qiuwen Lou Xiaofan Zhang Chao Zhu Yonghua Lin and 1 more

Neural network accelerators with low latency and energy consumption are desirable for edge computing. To create such accelerators, we propose a design flow accelerating the extremely bit-width neural (ELB-NN) in embedded FPGAs hybrid quantization schemes. This covers both training FPGA-based deployment, which facilitates space exploration simplifies tradeoff between accuracy computation efficiency. Using this helps hardware designers to deliver accelerator devices under strict resource power...

10.1109/fpl.2018.00035 preprint EN 2018-08-01

Device-Circuit-Architecture Co-Exploration for Computing-in-Memory Neural Accelerators

OPENALEX - Publications

Weiwen Jiang Qiuwen Lou Zheyu Yan Lei Yang Jingtong Hu and 2 more

Co-exploration of neural architectures and hardware design is promising due to its capability simultaneously optimize network accuracy efficiency. However, state-of-the-art architecture search algorithms for the co-exploration are dedicated conventional von-Neumann computing architecture, whose performance heavily limited by well-known memory wall. In this article, we first bring computing-in-memory which can easily transcend wall, interplay with search, aiming find most efficient high...

10.1109/tc.2020.2991575 article EN publisher-specific-oa IEEE Transactions on Computers 2020-04-30

A Mixed Signal Architecture for Convolutional Neural Networks

OPENALEX - Publications

Qiuwen Lou Chenyun Pan John McGuinness András Horváth Azad Naeemi and 2 more

Deep neural network (DNN) accelerators with improved energy and delay are desirable for meeting the requirements of hardware targeted IoT edge computing systems. Convolutional networks (CoNNs) belong to one most popular types DNN architectures. This article presents design evaluation an accelerator CoNNs. The system-level architecture is based on mixed-signal, cellular (CeNNs). Specifically, we present (i) implementation different layers, including convolution, ReLU, pooling, in a CoNN using...

10.1145/3304110 article EN ACM Journal on Emerging Technologies in Computing Systems 2019-03-26

Cellular neural network friendly convolutional neural networks — CNNs with CNNs

OPENALEX - Publications

András Horváth Michael Hillmer Qiuwen Lou Xiaobo Sharon Hu Michael Niemier

This paper discusses the development and evaluation of a Cellular Neural Network (CeNN) friendly deep learning network for solving MNIST digit recognition problem. Prior work has shown that CeNNs leveraging emerging technologies such as tunnel transistors can improve energy or EDP CeNNs, while simultaneously offering richer/more complex functionality. Important questions to address are what applications benefit from whether eventually outperform other alternatives at application-level in...

10.23919/date.2017.7926973 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2017-03-01

TransLand: An Adversarial Transfer Learning Approach for Migratable Urban Land Usage Classification using Remote Sensing

OPENALEX - Publications

Yang Zhang Ruohan Zong Jun Han Hao Zheng Qiuwen Lou and 2 more

Urban land usage classification is a critical task in big data based smart city applications that aim to understand the social-economic functions and physical attributes urban environments. This paper focuses on migratable problem using remote sensing (i.e., satellite images). Our goal accurately classify of locations target where ground truth not available by leveraging model from source such available. motivated limitation current solutions primarily rely rich set ground-truth for accurate...

10.1109/bigdata47090.2019.9006360 article EN 2021 IEEE International Conference on Big Data (Big Data) 2019-12-01

Hardware design and the competency awareness of a neural network

OPENALEX - Publications

Yukun Ding Weiwen Jiang Qiuwen Lou Jinglan Liu Jinjun Xiong and 3 more

10.1038/s41928-020-00476-7 article EN Nature Electronics 2020-09-18

Embedding error correction into crossbars for reliable matrix vector multiplication using emerging devices

OPENALEX - Publications

Qiuwen Lou Tianqi Gao Patrick Faley Michael Niemier Xiaobo Sharon Hu and 1 more

Emerging memory devices are an attractive choice for implementing very energy-efficient in-situ matrix-vector multiplication (MVM) use in intelligent edge platforms. Despite their great potential, device-level non-idealities have a large impact on the application-level accuracy of deep neural network (DNN) inference. We introduce low-density parity-check code (LDPC) based approach to correct non-ideality induced errors encountered during MVM. first encode weights using error correcting codes...

10.1145/3370748.3406583 article EN 2020-08-07

Application-driven Design Exploration for Dense Ferroelectric Embedded Non-volatile Memories

OPENALEX - Publications

Mohammad Mehdi Sharifi Lillian Pentecost Ramin Rajaei Arman Kazemi Qiuwen Lou and 6 more

The memory wall bottleneck is a key challenge across many data-intensive applications. Multi-level FeFET-based embedded non-volatile memories are promising solution for denser and more energy-efficient on-chip memory. However, reliable multi-level cell storage requires careful optimizations to minimize the design overhead costs. In this work, we investigate interplay between FeFET device characteristics, programming schemes, array architecture, explore different choices optimize performance,...

10.1109/islped52811.2021.9502489 article EN 2021-07-26

Energy-Efficient Convolutional Neural Network Based on Cellular Neural Network Using Beyond-CMOS Technologies

OPENALEX - Publications

Chenyun Pan Qiuwen Lou Michael Niemier Xiaobo Sharon Hu Azad Naeemi

In this article, we perform a uniform benchmarking for the convolutional neural network (CoNN) based on cellular (CeNN) using variety of beyond-CMOS technologies. Representative charge-based and spintronic device technologies are implemented to enable energy-efficient CeNN related computations. To alleviate delay energy overheads fully connected layer, hybrid CeNN-based CoNN system is proposed. It shown that low-power FETs devices promising candidates implement CoNNs CeNNs. Specifically,...

10.1109/jxcdc.2019.2960307 article EN cc-by IEEE Journal on Exploratory Solid-State Computational Devices and Circuits 2019-12-01

TFET-based Operational Transconductance Amplifier Design for CNN Systems

OPENALEX - Publications

Qiuwen Lou Indranil Palit András Horváth Xiaobo Sharon Hu Michael Niemier and 1 more

A Cellular Neural Network (CNN) is a powerful processor that can significantly improve the performance of spatio-temporal applications such as pattern recognition, image processing, motion detection, when compared to more traditional von Neumann architecture. In this paper, we show how tunneling field effect transistors (TFETs) be utilized enhance CNNs. Specifically, power consumption TFET-based CNNs lower MOSFET-based due improved voltage controlled current sources (VCCSs) - an important...

10.1145/2742060.2742089 article EN 2015-05-19

A Uniform Modeling Methodology for Benchmarking DNN Accelerators

OPENALEX - Publications

Indranil Palit Qiuwen Lou Robert Perricone Michael Niemier Xiaobo Sharon Hu

Deep Neural Networks (DNNs) have achieved tremendous success in many application domains. Inspired by its success, specialized accelerators been and continue to be developed process DNN workloads an energy-efficient manner. The design space for can extremely large since they employ different datapaths, data mapping strategies, circuits, device technologies. To explore the developing accelerators, it is important quickly estimate energy cost associated with accelerator. This paper introduces...

10.1109/iccad45719.2019.8942095 article EN 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) 2019-11-01

Design Flow of Accelerating Hybrid Extremely Low Bit-width Neural Network in Embedded FPGA

OPENALEX - Publications

Junsong Wang Qiuwen Lou Xiaofan Zhang Chao Zhu Yonghua Lin and 1 more

Neural network accelerators with low latency and energy consumption are desirable for edge computing. To create such accelerators, we propose a design flow accelerating the extremely bit-width neural (ELB-NN) in embedded FPGAs hybrid quantization schemes. This covers both training FPGA-based deployment, which facilitates space exploration simplifies tradeoff between accuracy computation efficiency. Using this helps hardware designers to deliver accelerator devices under strict resource power...

10.48550/arxiv.1808.04311 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Nonvolatile Spintronic Memory Cells for Neural Networks

OPENALEX - Publications

Andrew W. Stephan Qiuwen Lou Michael Niemier Xiaobo Sharon Hu Steven J. Koester

A new spintronic nonvolatile memory cell analogous to 1T DRAM with non-destructive READ is proposed. The cells can be used as neural computing units. dual-circuit network architecture proposed leverage these devices against the complex operations involved in convolutional networks. Simulations based on HSPICE and MATLAB were performed study performance of this when classifying images well effect varying size stability nanomagnets. outperform a purely charge-based implementation same network,...

10.1109/jxcdc.2019.2932992 article EN cc-by IEEE Journal on Exploratory Solid-State Computational Devices and Circuits 2019-08-02

Cellular neural networks for image analysis using steep slope devices

OPENALEX - Publications

Indranil Palit Qiuwen Lou Michael Niemier Behnam Sedighi Joseph Nahas and 1 more

Traditional CMOS based von Neumann architectures face daunting challenges in performing complex computational tasks at high speed and with low power on spatio-temporal data, e.g., image processing, pattern recognition, etc. In this study, we discuss the utilities of various steep slope, beyond-CMOS emerging devices for processing applications within non-von computing paradigm cellular neural networks (CNNs). general, subthreshold swing obviates output transfer hardware used a conventional...

10.5555/2691365.2691387 article EN International Conference on Computer Aided Design 2014-11-03

Analytically modeling power and performance of a CNN system

OPENALEX - Publications

Indranil Palit Qiuwen Lou Nicholas Acampora Joseph Nahas Michael Niemier and 1 more

Cellular neural networks (CNNs) are a powerful analog architecture that can outperform traditional von Neumann for spatio-temporal information processing applications, e.g., image and speech recognition. Much existing work reports energy dissipation CNNs at the chip level, which includes of sensors, actuators, other components. As such, impacts various system variables, application templates, characteristics resistive element, etc., on profile CNN cannot be easily determined. In this work,...

10.1109/iccad.2015.7372569 article EN 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) 2015-11-01

Analytically Modeling Power and Performance of a CNN System

OPENALEX - Publications

Indranil Palit Qiuwen Lou Nicholas Acampora Joseph Nahas Michael Niemier and 1 more

Cellular neural networks (CNNs) are a powerful analog architecture that can outperform traditional von Neumann for spatio-temporal information processing applications, e.g., image and speech recognition. Much existing work reports energy dissipation CNNs at the chip level, which includes of sensors, actuators, other components. As such, impacts various system variables, application templates, characteristics resistive element, etc., on profile CNN cannot be easily determined. In this work,...

10.5555/2840819.2840847 article EN International Conference on Computer Aided Design 2015-11-02

Cellular neural networks for image analysis using steep slope devices

OPENALEX - Publications

Indranil Palit Qiuwen Lou Michael Niemier Behnam Sedighi Joseph Nahas and 1 more

Traditional CMOS based von Neumann architectures face daunting challenges in performing complex computational tasks at high speed and with low power on spatio-temporal data, e.g., image processing, pattern recognition, etc. In this study, we discuss the utilities of various steep slope, beyond-CMOS emerging devices for processing applications within non-von computing paradigm cellular neural networks (CNNs). general, subthreshold swing obviates output transfer hardware used a conventional...

10.1109/iccad.2014.7001337 article EN 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) 2014-11-01

Super Efficient Neural Network for Compression Artifacts Reduction and Super Resolution

OPENALEX - Publications

Wen Ma Qiuwen Lou Arman Kazemi Julian Faraone Tariq Afzal

Video quality can suffer from limited internet speed while being streamed by users. Compression artifacts start to appear when the bitrate decreases match available bandwidth. Existing algorithms either focus on removing compression at same video resolution, or upscaling resolution but not artifacts. Super resolution-only approaches will amplify along with details default. We propose a lightweight convolutional neural network (CNN)-based algorithm which simultaneously performs reduction and...

10.48550/arxiv.2401.14641 preprint EN arXiv (Cornell University) 2024-01-25

Super Efficient Neural Network for Compression Artifacts Reduction and Super Resolution

OPENALEX - Publications

Wen Ma Qiuwen Lou Arman Kazemi Julian Faraone Tariq Afzal

Video quality can suffer from limited internet speed while being streamed by users. Compression artifacts start to appear when the bitrate decreases match available bandwidth. Existing algorithms either focus on removing compression at same video resolution, or upscaling resolution but not artifacts. Super resolution-only approaches will amplify along with details default. We propose a lightweight convolutional neural network (CNN)-based algorithm which simultaneously performs reduction and...

10.1109/wacvw60836.2024.00055 article EN 2024-01-01

Device-Circuit-Architecture Co-Exploration for Computing-in-Memory Neural Accelerators

OPENALEX - Publications

Weiwen Jiang Qiuwen Lou Zheyu Yan Lei Yang Jingtong Hu and 2 more

Co-exploration of neural architectures and hardware design is promising to simultaneously optimize network accuracy efficiency. However, state-of-the-art architecture search algorithms for the co-exploration are dedicated conventional von-neumann computing architecture, whose performance heavily limited by well-known memory wall. In this paper, we first bring computing-in-memory which can easily transcend wall, interplay with search, aiming find most efficient high maximized Such a novel...

10.48550/arxiv.1911.00139 preprint EN other-oa arXiv (Cornell University) 2019-01-01

A mixed signal architecture for convolutional neural networks

OPENALEX - Publications

Qiuwen Lou Chenyun Pan John C. McGuiness András Horváth Azad Naeemi and 2 more

Deep neural network (DNN) accelerators with improved energy and delay are desirable for meeting the requirements of hardware targeted IoT edge computing systems. Convolutional networks (CoNNs) belong to one most popular types DNN architectures. This paper presents design evaluation an accelerator CoNNs. The system-level architecture is based on mixed-signal, cellular (CeNNs). Specifically, we present (i) implementation different layers, including convolution, ReLU, pooling, in a CoNN using...

10.48550/arxiv.1811.02636 preprint EN other-oa arXiv (Cornell University) 2018-01-01

A Hybrid Optical-Electrical Analog Deep Learning Accelerator Using Incoherent Optical Signals

OPENALEX - Publications

Mingdai Yang Qiuwen Lou Ramin Rajaei Mohammad Reza Jokar Junyi Qiu and 8 more

Optical deep learning (DL) accelerators have attracted significant interests due to their latency and power advantages. In this article, we focus on incoherent optical designs. A challenge is that there no known solution perform single-wavelength accumulation (a key operation required for DL workloads) using signals efficiently. Therefore, devise a hybrid approach, where done in the electrical domain, multiplication performed domain. The technology enabler of our design transistor laser,...

10.1145/3584183 article EN ACM Journal on Emerging Technologies in Computing Systems 2023-02-18

Application-level Studies of Cellular Neural Network-based Hardware Accelerators

OPENALEX - Publications

Qiuwen Lou Indranil Palit Li Tang András Horváth Michael Niemier and 1 more

As cost and performance benefits associated with Moore's Law scaling slow, researchers are studying alternative architectures (e.g., based on analog and/or spiking circuits) computational models convolutional recurrent neural networks) to perform application-level tasks faster, more energy efficiently, accurately. We investigate cellular network (CeNN)-based co-processors at the for these metrics. While it is well-known that CeNNs can be well-suited spatio-temporal information processing,...

10.48550/arxiv.1903.06649 preprint EN other-oa arXiv (Cornell University) 2019-01-01