NFDI4DS | UHH-SEMS - Publication Details

DianNao

OPENALEX - Publications

Tianshi Chen Zidong Du Ninghui Sun Jia Wang Chengyong Wu and 2 more

Machine-Learning tasks are becoming pervasive in a broad range of domains, and systems (from embedded to data centers). At the same time, small set machine-learning algorithms (especially Convolutional Deep Neural Networks, i.e., CNNs DNNs) proving be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed mix cores accelerators, accelerator can achieve rare combination efficiency (due number target algorithms) application scope.

10.1145/2541940.2541967 article EN 2014-02-24

ShiDianNao

OPENALEX - Publications

Zidong Du Robert Fasthuber Tianshi Chen Paolo Ienne Ling Li and 4 more

In recent years, neural network accelerators have been shown to achieve both high energy efficiency and performance for a broad application scope within the important category of recognition mining applications.

10.1145/2749469.2750389 article EN 2015-05-26

Cambricon-X: An accelerator for sparse neural networks

OPENALEX - Publications

Shijin Zhang Zidong Du Lei Zhang Huiying Lan Shaoli Liu and 4 more

Neural networks (NNs) have been demonstrated to be useful in a broad range of applications such as image recognition, automatic translation and advertisement recommendation. State-of-the-art NNs are known both computationally memory intensive, due the ever-increasing deep structure, i.e., multiple layers with massive neurons connections (i.e., synapses). Sparse neural emerged an effective solution reduce amount computation required. Though existing NN accelerators able efficiently process...

10.1109/micro.2016.7783723 article EN 2016-10-01

Cambricon-x: an accelerator for sparse neural networks

OPENALEX - Publications

Shijin Zhang Zidong Du Lei Zhang Huiying Lan Shaoli Liu and 4 more

Neural networks (NNs) have been demonstrated to be useful in a broad range of applications such as image recognition, automatic translation and advertisement recommendation. State-of-the-art NNs are known both computationally memory intensive, due the ever-increasing deep structure, i.e., multiple layers with massive neurons connections (i.e., synapses). Sparse neural emerged an effective solution reduce amount computation required. Though existing NN accelerators able efficiently process...

10.5555/3195638.3195662 article EN 2016-10-15

DianNao

OPENALEX - Publications

Tianshi Chen Zidong Du Ninghui Sun Jia Wang Chengyong Wu and 2 more

Machine-Learning tasks are becoming pervasive in a broad range of domains, and systems (from embedded to data centers). At the same time, small set machine-learning algorithms (especially Convolutional Deep Neural Networks, i.e., CNNs DNNs) proving be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed mix cores accelerators, accelerator can achieve rare combination efficiency (due number target algorithms) application scope. Until...

10.1145/2644865.2541967 article EN ACM SIGPLAN Notices 2014-02-24

DianNao

OPENALEX - Publications

Tianshi Chen Zidong Du Ninghui Sun Jia Wang Chengyong Wu and 2 more

Machine-Learning tasks are becoming pervasive in a broad range of domains, and systems (from embedded to data centers). At the same time, small set machine-learning algorithms (especially Convolutional Deep Neural Networks, i.e., CNNs DNNs) proving be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed mix cores accelerators, accelerator can achieve rare combination efficiency (due number target algorithms) application scope. Until...

10.1145/2654822.2541967 article EN ACM SIGARCH Computer Architecture News 2014-02-24

Cambricon

OPENALEX - Publications

Shaoli Liu Zidong Du Jinhua Tao Dong Seog Han Tao Luo and 3 more

Neural Networks (NN) are a family of models for broad range emerging machine learning and pattern recondition applications. NN techniques conventionally executed on general-purpose processors (such as CPU GPGPU), which usually not energy-efficient since they invest excessive hardware resources to flexibly support various workloads. Consequently, application-specific accelerators neural networks have been proposed recently improve the energy-efficiency. However, such were designed small set...

10.1145/3007787.3001179 article EN ACM SIGARCH Computer Architecture News 2016-06-18

Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach

OPENALEX - Publications

Xuda Zhou Zidong Du Qi Guo Shaoli Liu Chengsi Liu and 5 more

Neural networks have become the dominant algorithms rapidly as they achieve state-of-the-art performance in a broad range of applications such image recognition, speech recognition and natural language processing. However, neural keep moving towards deeper larger architectures, posing great challenge to huge amount data computations. Although sparsity has emerged an effective solution for reducing intensity computation memory accesses directly, irregularity caused by (including sparse...

10.1109/micro.2018.00011 article EN 2018-10-01

ShiDianNao

OPENALEX - Publications

Zidong Du Robert Fasthuber Tianshi Chen Paolo Ienne Ling Li and 4 more

In recent years, neural network accelerators have been shown to achieve both high energy efficiency and performance for a broad application scope within the important category of recognition mining applications. Still, such remain limited by memory accesses. this paper, we focus on image applications, arguably most among The networks which are state-of-the-art these applications Convolutional Neural Networks (CNN), they an property: weights shared many neurons, considerably reducing...

10.1145/2872887.2750389 article EN ACM SIGARCH Computer Architecture News 2015-06-13

Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators

OPENALEX - Publications

Zidong Du Avinash Lingamneni Yunji Chen Krishna V. Palem Olivier Temam and 1 more

In recent years, inexact computing has been increasingly regarded as one of the most promising approaches for reducing energy consumption in many applications that can tolerate a degree inaccuracy. Driven by principle trading tolerable amounts application accuracy return significant resource savings - consumed, (critical path) delay and (silicon) area being resources this approach limited to certain domains. paper, we propose expand scope, error tolerance well systems through neural network...

10.1109/aspdac.2014.6742890 article EN 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) 2014-01-01

Cambricon: An Instruction Set Architecture for Neural Networks

OPENALEX - Publications

Shaoli Liu Zidong Du Jinhua Tao Dong Seog Han Tao Luo and 3 more

Neural Networks (NN) are a family of models for broad range emerging machine learning and pattern recondition applications. NN techniques conventionally executed on general-purpose processors (such as CPU GPGPU), which usually not energy-efficient since they invest excessive hardware resources to flexibly support various workloads. Consequently, application-specific accelerators neural networks have been proposed recently improve the energy-efficiency. However, such were designed small set...

10.1109/isca.2016.42 article EN 2016-06-01

AI Computing Systems for Large Language Models Training

OPENALEX - Publications

Zhenxing Zhang Yuanbo Wen Hairong Lyu Chang Liu Rui Zhang and 7 more

10.1007/s11390-024-4178-1 article EN Journal of Computer Science and Technology 2025-01-01

Neuromorphic accelerators

OPENALEX - Publications

Zidong Du Daniel Ben Dayan Rubin Yunji Chen Liqiang He Tianshi Chen and 3 more

A vast array of devices, ranging from industrial robots to self-driven cars or smartphones, require increasingly sophisticated processing real-world input data (image, voice, radio, ...). Interestingly, hardware neural network accelerators are emerging again as attractive candidate architectures for such tasks. The algorithms considered come two, largely separate, domains: machine-learning and neuroscience. These networks have very different characteristics, so it is unclear which approach...

10.1145/2830772.2830789 article EN 2015-12-05

TDSNN: From Deep Neural Networks to Deep Spike Neural Networks with Temporal-Coding

OPENALEX - Publications

Lei Zhang Shengyuan Zhou Tian Zhi Zidong Du Yunji Chen

Continuous-valued deep convolutional networks (DNNs) can be converted into accurate rate-coding based spike neural (SNNs). However, the substantial computational and energy costs, which is caused by multiple spikes, limit their use in mobile embedded applications. And recent works have shown that newly emerged temporal-coding SNNs from DNNs reduce load effectively. In this paper, we propose a novel method to convert SNNs, called TDSNN. Combined with characteristic of leaky integrate-andfire...

10.1609/aaai.v33i01.33011319 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

Assessing and Understanding Creativity in Large Language Models

OPENALEX - Publications

Yunpu Zhao Rui Zhang Wenyi Li Di Huang Jiaming Guo and 8 more

In the field of natural language processing, rapid development large model (LLM) has attracted more and attention. LLMs have shown a high level creativity in various tasks, but methods for assessing such are inadequate. The assessment LLM needs to consider differences from humans, requiring multi-dimensional measurement while balancing accuracy efficiency. This paper aims establish an efficient framework LLMs. By adapting modified Torrance Tests Creative Thinking, research evaluates creative...

10.48550/arxiv.2401.12491 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Rubik: A Hierarchical Architecture for Efficient Graph Neural Network Training

OPENALEX - Publications

Xiaobing Chen Yuke Wang Xinfeng Xie Xing Hu Abanti Basak and 6 more

The graph convolutional network (GCN) emerges as a promising direction to learn the inductive representation in data commonly used widespread applications, such E-commerce, social networks, and knowledge graphs. However, learning from graphs is nontrivial because of its mixed computation model involving both analytics neural computing. To this end, we decompose GCN into two hierarchical paradigms: 1) graph-level 2) node-level Such paradigm facilitates software hardware accelerations for...

10.1109/tcad.2021.3079142 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2021-05-11

Leveraging the Error Resilience of Neural Networks for Designing Highly Energy Efficient Accelerators

OPENALEX - Publications

Zidong Du Avinash Lingamneni Yunji Chen Krishna V. Palem Olivier Temam and 1 more

In recent years, inexact computing has been increasingly regarded as one of the most promising approaches for slashing energy consumption in many applications that can tolerate a certain degree inaccuracy. Driven by principle trading tolerable amounts application accuracy return significant resource savings-the consumed, (critical path) delay, and (silicon) area-this approach limited to application-specified integrated circuits (ASICs) so far. These ASIC realizations have narrow scope are...

10.1109/tcad.2015.2419628 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2015-04-03

Fixed-Point Back-Propagation Training

OPENALEX - Publications

Xishan Zhang Shaoli Liu Rui Zhang Chang Liu Di Huang and 6 more

Recent emerged quantization technique (i.e., using low bit-width fixed-point data instead of high floating-point data) has been applied to inference deep neural networks for fast and efficient execution. However, directly applying in training can cause significant accuracy loss, thus remaining an open challenge. In this paper, we propose a novel approach, which applies layer-wise precision-adaptive networks. The new approach leverages our key insight that the degradation is attributed...

10.1109/cvpr42600.2020.00240 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

BenchIP: Benchmarking Intelligence Processors

OPENALEX - Publications

Jinhua Tao Zidong Du Qi Guo Huiying Lan Lei Zhang and 10 more

10.1007/s11390-018-1805-8 article EN Journal of Computer Science and Technology 2018-01-01

Heron: Automatically Constrained High-Performance Library Generation for Deep Learning Accelerators

OPENALEX - Publications

Jun Bi Qi Guo Xiaqing Li Yongwei Zhao Yuanbo Wen and 7 more

Deep Learning Accelerators (DLAs) are effective to improve both performance and energy efficiency of compute-intensive deep learning algorithms. A flexible portable mean exploit DLAs is using high-performance software libraries with well-established APIs, which typically either manually implemented or automatically generated by exploration-based compilation approaches. Though approaches significantly reduce programming efforts, they fail find optimal near-optimal programs from a large but...

10.1145/3582016.3582061 article EN 2023-03-20

Morphology generalizable reinforcement learning via multi-level graph features

OPENALEX - Publications

Yansong Pan Rui Zhang Jia‐Ming Guo Shaohui Peng Fan Wu and 12 more

10.1016/j.neucom.2025.129644 article EN Neurocomputing 2025-02-01

SaaP: Rearchitect SoC-as-a-Processor to Orchestrate Hardware Heterogeneity

OPENALEX - Publications

Pengwei Jin Zheyong Fan Yongwei Zhao Zidong Du H.R. Guo and 11 more

10.1109/tcad.2025.3553074 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2025-01-01

Mosaic: Exploiting Instruction-Level Parallelism on Deep Learning Accelerators with iTex Tessellation

OPENALEX - Publications

Jianxing Xu Yuanbo Wen Zikang Liu R.X. Xu Tingting Ruan and 10 more

10.1145/3676641.3716262 article EN 2025-03-27

InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct

OPENALEX - Publications

Yutong Wu Di Huang Wenxuan Shi Wei Wang Yewen Pu and 11 more

Recent advancements in open-source code large language models (LLMs) have been driven by fine-tuning on the data generated from powerful closed-source LLMs, which are expensive to obtain. This paper explores whether it is possible use a fine-tuned model generate additional augment its instruction-tuning dataset. We make two observations: (1) A snippet can serve as response different instructions. (2) Instruction-tuned LLMs perform better at translating into instructions than reverse. Based...

10.1609/aaai.v39i24.34742 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11