NFDI4DS | UHH-SEMS - Publication Details

Nianzheng Cao

ORCID: 0000-0003-2786-9139

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5042802099

Research Areas

Fluid Dynamics and Turbulent Flows
Wind and Air Flow Studies
Low-power high-performance VLSI design
Meteorological Phenomena and Simulations
Parallel Computing and Optimization Techniques
Advanced Neural Network Applications
Ferroelectric and Negative Capacitance Devices
Advanced Memory and Neural Computing
Electromagnetic Compatibility and Noise Suppression
Advancements in Semiconductor Devices and Circuit Design
Electrostatic Discharge in Electronics
Particle Dynamics in Fluid Flows
Plant Water Relations and Carbon Dynamics
Advancements in PLL and VCO Technologies
VLSI and Analog Circuit Testing
Photonic and Optical Devices
Electromagnetic Scattering and Analysis
Nonlinear Photonic Systems
Complex Systems and Time Series Analysis
Machine Learning and Data Classification
Domain Adaptation and Few-Shot Learning
Advanced Thermodynamic Systems and Engines
Rheology and Fluid Dynamics Studies
Fluid Dynamics and Vibration Analysis
Aerosol Filtration and Electrostatic Precipitation

IBM Research - Thomas J. Watson Research Center
1995-2024

IBM (United States)
2005-2024

W. L. Gore & Associates (United States)
2003

Los Alamos National Laboratory
1995-1999

City College of New York
1993

Peking University
1989

Physical symmetry and lattice symmetry in the lattice Boltzmann method

OPENALEX - Publications

Nianzheng Cao Shiyi Chen Shi Jin Daniel Martínez

The lattice Boltzmann method (LBM) is regarded as a specific finite difference discretization for the kinetic equation of discrete velocity distribution function. We argue that sets models, such LBM, physical symmetry necessary obtaining correct macroscopic Navier-Stokes equations. In contrast, and Lagrangian nature scheme, which often used in gas automaton existing methods directly associated with property particle dynamics, not recovering dynamics. By relaxing constraint introducing other...

10.1103/physreve.55.r21 article EN Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics 1997-01-01

A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference

OPENALEX - Publications

Bruce Fleischer Sunil Shukla Matthew M. Ziegler J. A. Silberman Jinwook Oh and 26 more

A multi-TOPS AI core is presented for acceleration of deep learning training and inference in systems from edge devices to data centers. With a programmable architecture custom ISA, this engine achieves >90% sustained utilization across the range neural network topologies by employing dataflow an on-chip scratchpad hierarchy. Compute precision optimized at 16b floating point (fp 16) high model accuracy as well 1b/2b (bi-nary/ternary) integer aggressive performance. At 1.5 GHz, prototype...

10.1109/vlsic.2018.8502276 article EN 2018-06-01

9.1 A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling

OPENALEX - Publications

Ankur Agrawal Sae Kyu Lee J. A. Silberman Matthew M. Ziegler Mingu Kang and 39 more

Low-precision computation is the key enabling factor to achieve high compute densities (T0PS/W and T0PS/mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) in AI hardware accelerators across cloud edge platforms. However, robust deep learning (DL) model accuracy equivalent high-precision must be maintained. Improvements bandwidth, architecture, power management are also required harness benefit of reduced precision by feeding supporting...

10.1109/isscc42613.2021.9365791 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2021-02-13

Efficient AI System Design With Cross-Layer Approximate Computing

OPENALEX - Publications

Swagath Venkataramani Xiao Sun Naigang Wang Chia‐Yu Chen Jungwook Choi and 35 more

Advances in deep neural networks (DNNs) and the availability of massive real-world data have enabled superhuman levels accuracy on many AI tasks ushered explosive growth workloads across spectrum computing devices. However, their superior comes at a high computational cost, which necessitates approaches beyond traditional paradigms to improve operational efficiency. Leveraging application-level insight error resilience, we demonstrate how approximate (AxC) can significantly boost efficiency...

10.1109/jproc.2020.3029453 article EN Proceedings of the IEEE 2020-11-10

RaPiD: AI Accelerator for Ultra-low Precision Training and Inference

OPENALEX - Publications

Swagath Venkataramani Vijayalakshmi Srinivasan Wei Wang Sanchari Sen Jintao Zhang and 49 more

The growing prevalence and computational demands of Artificial Intelligence (AI) workloads has led to widespread use hardware accelerators in their execution. Scaling the performance AI across generations is pivotal success commercial deployments. intrinsic error-resilient nature present a unique opportunity for performance/energy improvement through precision scaling. Motivated by recent algorithmic advances scaling inference training, we designed RaPiD <sup...

10.1109/isca52012.2021.00021 article EN 2021-06-01

Refined Similarity Hypothesis for Transverse Structure Functions in Fluid Turbulence

OPENALEX - Publications

Shiyi Chen Katepalli R. Sreenivasan Mark Nelkin Nianzheng Cao

We argue on the basis of empirical data that Kolmogorov's refined similarity hypothesis (RSH) needs to be modified for transverse velocity increments, and propose an alternative. In this new form, increments bear same relation locally averaged enstrophy (squared vorticity) as longitudinal in RSH dissipation. support by analyzing high-resolution numerical simulation isotropic turbulence. its proposed modification appear represent two independent scaling groups.

10.1103/physrevlett.79.2253 article EN Physical Review Letters 1997-09-22

Energy flux density in a thermoacoustic couple

OPENALEX - Publications

Nianzheng Cao J. R. Olson G. W. Swift Shiyi Chen

The hydro- and thermodynamical processes near within a thermoacoustic couple are simulated analyzed by numerical solution of the compressible Navier–Stokes, continuity, energy equations for an ideal gas, concentrating on time-averaged flux density in gas. results show details heat sink at one end plates couple.

10.1121/1.414992 article EN The Journal of the Acoustical Society of America 1996-06-01

Statistics and structures of pressure in isotropic turbulence

OPENALEX - Publications

Nianzheng Cao Shiyi Chen Gary D. Doolen

Statistics and structures of pressure in three-dimensional incompressible isotropic turbulence are studied using high-resolution direct numerical simulation for Taylor microscale Reynolds numbers up to 220. It is found that the probability distribution function (PDF) has negative skewness due both kinematic dynamic effects, contrast statistics head, whose PDF almost symmetric. The statistical relations among pressure, vorticity, dissipation kinetic energy investigated conditional averaging....

10.1063/1.870085 article EN Physics of Fluids 1999-08-01

A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference

OPENALEX - Publications

Jinwook Oh Sae Kyu Lee Mingu Kang Matthew M. Ziegler J. A. Silberman and 38 more

A processor core is presented for AI training and inference products. Leading-edge compute efficiency achieved robust fp16 via efficient heterogeneous 2-D systolic array-SIMD engines leveraging compact DLFloat16 FPUs. Architectural flexibility maintained very high utilization across neural network topologies. modular dual-corelet architecture with a shared scratchpad software-controlled network/memory interface enables scalability to many-core SoCs large-scale systems. The 14nm achieves peak...

10.1109/vlsicircuits18222.2020.9162917 article EN 2020-06-01

A 45 nm SOI Embedded DRAM Macro for the POWER™ Processor 32 MByte On-Chip L3 Cache

OPENALEX - Publications

J. Barth Kavita Nair Nianzheng Cao Don Plass Erik Nelson and 6 more

A 1.35 ns random access and 1.7 ns-random-cycle SOI embedded-DRAM macro has been developed for the POWER7™ high-performance microprocessor. The employs a 6 transistor micro sense-amplifier architecture with extended precharge scheme to enhance sensing margin product quality. detailed study shows 67% bit-line power reduction only 1.7% area overhead, while improving read zero by more than 500ps. array voltage window is improved programmable BL generator, allowing embedded DRAM operate reliably...

10.1109/jssc.2010.2084470 article EN IEEE Journal of Solid-State Circuits 2010-11-30

Scalings and Relative Scalings in the Navier-Stokes Turbulence

OPENALEX - Publications

Nianzheng Cao Shiyi Chen Zhen-Su She

High-resolution direct numerical simulations of 3D Navier-Stokes turbulence with normal viscosity and hyperviscosity are carried out. It is found that the inertial-range statistics, both scalings probability density functions, independent dissipation mechanism, while near-dissipation-range fluctuations show significant structural differences. Nevertheless, relative expressing dependence moments at different orders universal, unambiguous departure from Kolmogorov 1941 description, including...

10.1103/physrevlett.76.3711 article EN Physical Review Letters 1996-05-13

A Scalable Multi-TeraOPS Core for AI Training and Inference

OPENALEX - Publications

Sunil Shukla Bruce Fleischer Matthew M. Ziegler J. A. Silberman Jinwook Oh and 26 more

This letter presents a multi-TOPS AI accelerator core for deep learning training and inference. With programmable architecture custom ISA, this engine achieves >90% sustained utilization across the range of neural network topologies by employing dataflow to provide high throughput an on-chip scratchpad hierarchy meet bandwidth demands compute units. A 16b floating point (fp16) representation with 1 sign bit, 6 exponent bits, 9 mantissa bits has also been developed model accuracy in inference...

10.1109/lssc.2019.2902738 article EN IEEE Solid-State Circuits Letters 2018-12-01

Asymmetry of Velocity Increments in Fully Developed Turbulence and the Scaling of Low-Order Moments

OPENALEX - Publications

Katepalli R. Sreenivasan С. И. Ваинштейн Rustom B. Bhiladvala Inigo San Gil Shiyi Chen and 1 more

A study is made of the scaling positive part (PP) and negative (NP) velocity increments in turbulent pipe flow simulated homogeneous turbulence a box. For moment orders above unity, moments NP are larger than those PP for all separation distances, exponents NP. below absolute value increment NP, as well PP, possess which vary linearly with order q, though apparently greater $q/3$.

10.1103/physrevlett.77.1488 article EN Physical Review Letters 1996-08-19

Anomalous Scaling and Structure Instability in Three-Dimensional Passive Scalar Turbulence

OPENALEX - Publications

Shiyi Chen Nianzheng Cao

The anomalous scaling phenomena of three-dimensional passive scalar turbulence are studied using high resolution direct numerical simulation. inertial range exponents the increment and dissipation obtained. connection between intermittency structure exponent is examined instability amplitude used to clarify previous experimental results for exponents.

10.1103/physrevlett.78.3459 article EN Physical Review Letters 1997-05-01

A 7-nm Four-Core Mixed-Precision AI Chip With 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware Throttling

OPENALEX - Publications

Sae Kyu Lee Ankur Agrawal J. A. Silberman Matthew M. Ziegler Mingu Kang and 39 more

Reduced precision computation is a key enabling factor for energy-efficient acceleration of deep learning (DL) applications. This article presents 7-nm four-core mixed-precision artificial intelligence (AI) chip that supports four compute precisions—FP16, Hybrid-FP8 (HFP8), INT4, and INT2—to support diverse application demands training inference. The leverages cutting-edge algorithmic advances to demonstrate leading-edge power efficiency 8-bit floating-point (FP8) INT4 inference without...

10.1109/jssc.2021.3120113 article EN IEEE Journal of Solid-State Circuits 2021-11-10

A 45nm SOI embedded DRAM macro for POWER7TM 32MB on-chip L3 cache

OPENALEX - Publications

J. Barth Don Plass Erik Nelson Charlie Hwang Gregory Fredeman and 5 more

This paper presents a 1.7 ns-random-cycle SOI embedded-DRAM macro developed for the POWER7¿ high-performance microprocessor and introduces enhancements to micro-sense-amplifier (¿SA) architecture. The enables 32 MB on-chip L3 cache, eliminating delay, area power from off-chip interface.

10.1109/isscc.2010.5433814 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2010-02-01

Inertial range scaling in turbulence

OPENALEX - Publications

Shiyi Chen Nianzheng Cao

In this paper, we apply She and Leveque's [Z.-S. E. Leveque, Phys. Rev. Lett. 72, 336 (1994)] hierarchy model under the assumption that ${\mathrm{lim}}_{\mathit{p}\ensuremath{\rightarrow}\mathrm{\ensuremath{\infty}}}$${\mathrm{\ensuremath{\tau}}}_{\mathit{p}}$/p=-1 with ${\mathrm{\ensuremath{\tau}}}_{\mathit{p}}$ being scaling exponent for local averaged dissipation function suggested by Novikov [E. A. Novikov, E 50, R3303 (1994)]. The resulting agrees well existing theoretical experimental...

10.1103/physreve.52.r5757 article EN Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics 1995-12-01

14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC

OPENALEX - Publications

Monodeep Kar J. A. Silberman Swagath Venkataramani V. Srinivasan Bruce Fleischer and 41 more

The rapid emergence of AI models, specifically large language models (LLMs) requiring amounts compute, drives the need for dedicated inference hardware. During deployment, compute utilization (and thus power consumption) can vary significantly across layers an model, number tokens, precision, and batch size [1]. Such wide variation, which may occur at fast time scales, poses unique challenges in optimizing performance within system-level specifications discrete accelerator cards, including...

10.1109/isscc49657.2024.10454301 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2024-02-18

Properties of Velocity Circulation in Three-Dimensional Turbulence

OPENALEX - Publications

Nianzheng Cao Shiyi Chen Katepalli R. Sreenivasan

Properties of velocity circulation in three-dimensional turbulence are studied using data from high-resolution direct numerical simulation Navier-Stokes equations. The probability density function (PDF) the depends on area closed contour for which is calculated, but not shape contour. For contours lying within inertial range, PDF has a Gaussian core with conspicuous exponential tails, indicating that intermittency plays an important role statistics. measured scaling exponents anomalous and...

10.1103/physrevlett.76.616 article EN Physical Review Letters 1996-01-22

Scaling of Low-Order Structure Functions in Homogeneous Turbulence

OPENALEX - Publications

Nianzheng Cao Shiyi Chen Katepalli R. Sreenivasan

High-resolution direct numerical simulation data for three-dimensional Navier-Stokes turbulence in a periodic box are used to study the scaling behavior of low-order velocity structure functions with positive and negative powers. Similar high-order statistics, relative exponents exhibit unambiguous departures from Kolmogorov 1941 theory agree well existing multiscaling models. No transition normal anomalous is observed.

10.1103/physrevlett.77.3799 article EN Physical Review Letters 1996-10-28

An intermittency model for passive-scalar turbulence

OPENALEX - Publications

Nianzheng Cao Shiyi Chen

A phenomenological model for the inertial range scaling of passive-scalar turbulence is developed based on a bivariate log-Poisson model. An analytical formula exponent three-dimensional deduced. The predicted exponents are compared with experimental measurements, showing good agreement.

10.1063/1.869265 article EN Physics of Fluids 1997-05-01

Power-Limited Inference Performance Optimization Using a Software-Assisted Peak Current Regulation Scheme in a 5-nm AI SoC

OPENALEX - Publications

Monodeep Kar J. A. Silberman Swagath Venkataramani V. Srinivasan Bruce Fleischer and 41 more

10.1109/jssc.2024.3472023 article EN IEEE Journal of Solid-State Circuits 2024-01-01

A low-voltage swing latch for reduced power dissipation in high-frequency microprocessors

OPENALEX - Publications

Pong-Fei Lu L. Sigal Nianzheng Cao P. J. M. Wöltgens R. P. Robertazzi and 1 more

We report A new low-swing latch (LSL) for low-power applications. Unlike the conventional transmission gate latch, LSL allows reduced voltage on clock inputs. Therefore local buffer (LCB) can use swing to save power while all other circuits are running at nominal voltage. have implemented an accumulator loop experiment in early version of IBM's 90 nm SOI technology a testchip. The consists adder and decrementer surrounded by latches mimic logic between pipeline stages. Side-by-side...

10.1109/soi.2004.1391601 article EN 2005-03-07

Across the Stack Opportunities for Deep Learning Acceleration

OPENALEX - Publications

Vijayalakshmi Srinivasan Bruce Fleischer Sunil Shukla Matthew M. Ziegler J. A. Silberman and 26 more

The combination of growth in compute capabilities and availability large datasets has led to a re-birth deep learning. Deep Neural Networks (DNNs) have become state-of-the-art variety machine learning tasks spanning domains across vision, speech, translation. Learning (DL) achieves high accuracy these at the expense 100s ExaOps computation; posing significant challenges efficient large-scale deployment both resource-constrained environments data centers.

10.1145/3218603.3241339 article EN Proceedings of the International Symposium on Low Power Electronics and Design 2018-07-23

Coming Soon ...