NFDI4DS | UHH-SEMS - Publication Details

MaPU: A novel mathematical computing architecture

OPENALEX - Publications

Donglin Wang Lei Wang Zijun Liu Tao Wang Zhonghua Pu and 21 more

As the feature size of semiconductor process is scaling down to 10nm and below, it possible assemble systems with high performance processors that can theoretically provide computational power up tens PLOPS. However, consumption these also rocketing millions watts, actual only around 60% theoretical performance. Today, efficiency sustained have become main foci processor designers. Traditional computing architecture such as superscalar GPGPU are proven be inefficient, there a big gap between...

10.1109/hpca.2016.7446086 article EN 2016-03-01

Baidu Kunlun An AI processor for diversified workloads

OPENALEX - Publications

Jian Ouyang Mijung Noh Yong Wang Qi Wei Ma Yin and 11 more

This article consists only of a collection slides from the author's conference presentation.

10.1109/hcs49909.2020.9220641 article EN 2020-08-01

3.3 Kunlun: A 14nm High-Performance AI Processor for Diversified Workloads

OPENALEX - Publications

Jian Ouyang Xueliang Du Ma Yin Jiaqiang Liu

In order to be able handle a wide range of AI applications, such as for speech, image, language and autonomous driving, it is necessary that an accelerator flexible enough diversified workloads. Baidu Kunlun, chip designed in-house by Baidu, achieves this capability with high programmability, flexibility performance. Kunlun was inspired the XPU architecture [1]. The implemented in Samsung 14nm process technology. Its peak performance 230TOPS@INT8 at 900MHz up 281TOPS@INT8 1.1GHz boost...

10.1109/isscc42613.2021.9366056 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2021-02-13

A flexible FPGA-to-FPGA communication system

OPENALEX - Publications

An Wu Xi Jin Xueliang Du Shuaizhi Guo

In high-performance computing systems, each node communicates via a high-speed serial bus to ensure sufficient data transfer bandwidth. However, of different protocols is very difficult communicate directly, which not conducive the extensibility HPC (High performance computing) clusters. this paper, we propose UPI, inter-node communication interface based on FPGA, can transmit (PCIe protocol and Ethernet protocol) simultaneously. More importantly, many bus-supported nodes be connected same...

10.1109/icact.2016.7423482 article EN 2022 24th International Conference on Advanced Communication Technology (ICACT) 2016-01-01

An efficient and effective performance estimation method for DSE

OPENALEX - Publications

Lin Chen Xueliang Du Xinwei Jiang Donglin Wang

Design Space Exploration (DSE) is a critical step in the chip design. The tradeoffs and interactions among parameters are traditionally evaluated by simulating or synthesizing variety of designs which intractable. predictive modeling techniques have been applied to predict design performance for DSE. For system-on-a-chip (SoC) DSE cases, however, it difficult achieve high accuracy with previous methods due their limitations. In this paper, we proposed new estimation method based on...

10.1109/vlsi-dat.2016.7482568 article EN 2016-04-01

A flexible FPGA-to-FPGA interconnect interface design and implementation

OPENALEX - Publications

An Wu Xi Jin Shuaizhi Guo Xueliang Du

In FPGA-based SoCs, interconnect bus such as PCIe and Ethernet has a separate physical layer interface. The (PHY) consumes quite few power consumption area overhead. this paper, we propose flexible interface (Unified PHY Interface, UPI) based on FPGA describe its design. More specifically, UPI can parse various packets automatically by adding an convertor between upper layer. Thus, architecture be realized using for each controller. We implemented two Xilinx Virtex-7 FPGAs with Synopsys...

10.1109/ccoms.2015.7562851 article EN 2015-11-01

Re-Factored Operational Support Systems for the Next Generation Platform-as-a-Service (NGPaaS)

OPENALEX - Publications

Paul Veitch Adam Broadbent Steven Van Rossem Bessem Sayadi Lionel Natarianni and 16 more

Platform-As-A-Service (PaaS) systems offer customers a rich environment in which to build, deploy, and run applications. Today's PaaS offerings are tailored mainly the needs of web mobile applications developers, involve fairly rigid stack components features. The vision H2020 5GPPP Phase 2 Next Generation Platform-as-a-Service (NGPaaS) project is enable "build-to-order" customized PaaSes, wide range use cases with telco-grade 5G characteristics. This paper sets out salient innovative...

10.1109/5gwf.2018.8516995 article EN 2018-07-01

A flexible FPGA-to-FPGA communication system

OPENALEX - Publications

An Wu Xi Jin Xueliang Du Shuaizhi Guo

In high-performance computing systems, each node communicates via a high-speed serial bus to ensure sufficient data transfer bandwidth. However, of different protocols is very difficult communicate directly, which not conducive the extensibility HPC (High performance computing) clusters. this paper, we propose UPI, inter-node communication interface based on FPGA, can transmit (PCIe protocol and Ethernet protocol) simultaneously. More importantly, many bus-supported nodes be connected same...

10.23919/icact.2017.7890234 article EN 2022 24th International Conference on Advanced Communication Technology (ICACT) 2017-01-01

Design of a Distributed Compressor for Astronomy SSD

OPENALEX - Publications

Bo Peng Xi Jin Tianqi Wang Xueliang Du

SSD (solid state device) has shown a great potential in astronomy data storage. Data compression is an essential task to obtain higher storage density and bandwidth. This paper proposes distributed compressor customized for FPGA-based SSD. Our data-driven cope with the unit of byte, two algorithms, run length length-limited huffman are utilized, encoder further developed reduce latency. Experimental results indicate that our proposed achieves 1GB/s bandwidth less than 2500 LUTs utilized...

10.1109/fccm.2015.29 article EN 2015-05-01

Progress in a novel architecture for high performance processing

OPENALEX - Publications

Zhiwei Zhang Meng Liu Zijun Liu Xueliang Du Shaolin Xie and 7 more

The high performance processing (HPP) is an innovative architecture which targets on computing with excellent power efficiency and performance. It suitable for data intensive applications like supercomputing, machine learning wireless communication. An example chip four application-specific integrated circuit (ASIC) cores the first generation of HPP has been taped out successfully under Taiwan Semiconductor Manufacturing Company (TSMC) 40 nm low process. shows great energy over traditional...

10.7567/jjap.57.04fa03 article EN Japanese Journal of Applied Physics 2018-03-13

Dynamic system reliability modeling using extended hybrid Petri nets

OPENALEX - Publications

Xueliang Du Shengkui Zeng Jianbin Guo

Traditional system reliability model has almost neglected the coupling between different states (normal, failures, etc) and continuous variation process of performance. This paper presents a method modeling based on hybrid Petri nets (HPN), which combines discrete state performance together during to describe relationship. Firstly, normal running mode fault were established using HPN logical relationship states; Secondly, account each state, corresponding models uncertain external...

10.1109/phm.2014.6988208 article EN 2014-08-01

A Low Cost Anti-aliasing Scheme for Mobile Devices

OPENALEX - Publications

Daolu Zha Xi Jin An Wu Xiang Tian Xueliang Du

An improved anti-aliasing sampling algorithm is submitted to reduce the increasing memory consumption caused by super-sampling in mobile devices. Six-point anisotropy blends two samples of a pixel, as well nearby pixels. Experiment results showed that six-point has reduced 50% than traditional FLIPQUAD algorithm. This method similar quality with only consumption.

10.1109/icisce.2015.11 article EN 2015-04-01

The Development of the Automatic Pick-and-Placing Manipulator and Control System

OPENALEX - Publications

Xueliang Du Mingfu Yin

At present, most of the factories carry materials by hand about ceramic.This paper designed structure product with automatic pick-and-placing manipulator as research object.Furthermore, it applied 3D model and motion simulation to through SolidWorks.Then gave a design scheme manipulator.Finally obtained feasibility scheme.At same time, this adopted control strategy force ring position fuzzy adoptive PID algorithm ensure precision requirement.It also made dynamic for movement manipulator.The...

10.2991/icmeis-15.2015.103 article EN cc-by-nc Advances in engineering research/Advances in Engineering Research 2015-01-01

A flexible FPGA-to-FPGA communication system

OPENALEX - Publications

An Wu Xi Jin Xueliang Du Shuaizhi Guo

In high-performance computing systems, each node communicates via a high-speed serial bus to ensure sufficient data transfer bandwidth. However, of different protocols is very difficult communicate directly, which not conducive the extensibility HPC (High performance computing) clusters. this paper, we propose UPI, inter-node communication interface based on FPGA, can transmit (PCIe protocol and Ethernet protocol) simultaneously. More importantly, many bus-supported nodes be connected same...

10.1109/icact.2016.7423481 article EN 2022 24th International Conference on Advanced Communication Technology (ICACT) 2016-01-01

Optimizing Memory Allocation for Multi-Subgraph Mapping on Spatial Accelerators

OPENALEX - Publications

Lei Lei Decai Pan Dajiang Liu Peng Ouyang Xueliang Du

Spatial accelerators enable the pervasive use of energy-efficient solutions for computation-intensive applications. In mapping spatial accelerators, a large kernel is usually partitioned into multiple subgraphs resource constraints, leading to more memory accesses and access conflicts. To minimize conflicts, existing works either neglect interference or pay little attention data's life cycle along execution order. this end, paper proposes an optimized allocation approach multi-subgraph on by...

10.1145/3579370.3594767 article EN 2023-06-05

TxCP: A Coprocessor for LTE-A

OPENALEX - Publications

Xin Huang Liu Qing-bin Junning Wu Xueliang Du Donglin Wang

With the widely use of 4G network, corresponding bandwidth processing has become a critical issue. The current recognized network is LTE-A. In baseband for LTE-A, its physical layer algorithm biggest bottleneck processors. application specific integrated circuit (ASIC) design necessary. This article will introduce communication dedicated coprocessor (TxCP), specifically LTE-A uplink shared/control channel (PUSCH/PUCCH) bit-level acceleration. Its internal support PUSCH/PUCCH CRC, Turbo...

10.1051/matecconf/201712801011 article EN cc-by MATEC Web of Conferences 2017-01-01

A self-indexed register file for efficient arithmetical computing hardware

OPENALEX - Publications

Lei Yang Shaolin Xie Zijun Liu Xueliang Du Donglin Wang

This paper presents a novel register file with self-indexed features, targeting the DSP/media algorithm massive data locality. The (SIRF) contains 128 high-speed registers, 4 input ports and output ports. It can be accessed double circular window mode, or simply immediate index mode. SIRF eliminate write after (WAW) dependence without renaming in hardware redundant allocation compilers, it also reduce address computation if accessing pattern satisfies was implemented high performance...

10.1109/ceec.2017.8101592 article EN 2017-09-01

Optimizing and Implementing the High Dynamic Range Video Algorithom

OPENALEX - Publications

An Wu Xi Jin Xueliang Du Kening Zhang Chunhe Yao and 1 more

10.7544/issn1000-1239.2017.20160122 article EN Journal of Computer Research and Development 2017-05-01