Nianzheng Cao

ORCID: 0000-0003-2786-9139
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Fluid Dynamics and Turbulent Flows
  • Wind and Air Flow Studies
  • Low-power high-performance VLSI design
  • Meteorological Phenomena and Simulations
  • Parallel Computing and Optimization Techniques
  • Advanced Neural Network Applications
  • Ferroelectric and Negative Capacitance Devices
  • Advanced Memory and Neural Computing
  • Electromagnetic Compatibility and Noise Suppression
  • Advancements in Semiconductor Devices and Circuit Design
  • Electrostatic Discharge in Electronics
  • Particle Dynamics in Fluid Flows
  • Plant Water Relations and Carbon Dynamics
  • Advancements in PLL and VCO Technologies
  • VLSI and Analog Circuit Testing
  • Photonic and Optical Devices
  • Electromagnetic Scattering and Analysis
  • Nonlinear Photonic Systems
  • Complex Systems and Time Series Analysis
  • Machine Learning and Data Classification
  • Domain Adaptation and Few-Shot Learning
  • Advanced Thermodynamic Systems and Engines
  • Rheology and Fluid Dynamics Studies
  • Fluid Dynamics and Vibration Analysis
  • Aerosol Filtration and Electrostatic Precipitation

IBM Research - Thomas J. Watson Research Center
1995-2024

IBM (United States)
2005-2024

W. L. Gore & Associates (United States)
2003

Los Alamos National Laboratory
1995-1999

City College of New York
1993

Peking University
1989

The lattice Boltzmann method (LBM) is regarded as a specific finite difference discretization for the kinetic equation of discrete velocity distribution function. We argue that sets models, such LBM, physical symmetry necessary obtaining correct macroscopic Navier-Stokes equations. In contrast, and Lagrangian nature scheme, which often used in gas automaton existing methods directly associated with property particle dynamics, not recovering dynamics. By relaxing constraint introducing other...

10.1103/physreve.55.r21 article EN Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics 1997-01-01

A multi-TOPS AI core is presented for acceleration of deep learning training and inference in systems from edge devices to data centers. With a programmable architecture custom ISA, this engine achieves >90% sustained utilization across the range neural network topologies by employing dataflow an on-chip scratchpad hierarchy. Compute precision optimized at 16b floating point (fp 16) high model accuracy as well 1b/2b (bi-nary/ternary) integer aggressive performance. At 1.5 GHz, prototype...

10.1109/vlsic.2018.8502276 article EN 2018-06-01

Low-precision computation is the key enabling factor to achieve high compute densities (T0PS/W and T0PS/mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) in AI hardware accelerators across cloud edge platforms. However, robust deep learning (DL) model accuracy equivalent high-precision must be maintained. Improvements bandwidth, architecture, power management are also required harness benefit of reduced precision by feeding supporting...

10.1109/isscc42613.2021.9365791 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2021-02-13

Advances in deep neural networks (DNNs) and the availability of massive real-world data have enabled superhuman levels accuracy on many AI tasks ushered explosive growth workloads across spectrum computing devices. However, their superior comes at a high computational cost, which necessitates approaches beyond traditional paradigms to improve operational efficiency. Leveraging application-level insight error resilience, we demonstrate how approximate (AxC) can significantly boost efficiency...

10.1109/jproc.2020.3029453 article EN Proceedings of the IEEE 2020-11-10

The growing prevalence and computational demands of Artificial Intelligence (AI) workloads has led to widespread use hardware accelerators in their execution. Scaling the performance AI across generations is pivotal success commercial deployments. intrinsic error-resilient nature present a unique opportunity for performance/energy improvement through precision scaling. Motivated by recent algorithmic advances scaling inference training, we designed RaPiD <sup...

10.1109/isca52012.2021.00021 article EN 2021-06-01

We argue on the basis of empirical data that Kolmogorov's refined similarity hypothesis (RSH) needs to be modified for transverse velocity increments, and propose an alternative. In this new form, increments bear same relation locally averaged enstrophy (squared vorticity) as longitudinal in RSH dissipation. support by analyzing high-resolution numerical simulation isotropic turbulence. its proposed modification appear represent two independent scaling groups.

10.1103/physrevlett.79.2253 article EN Physical Review Letters 1997-09-22

The hydro- and thermodynamical processes near within a thermoacoustic couple are simulated analyzed by numerical solution of the compressible Navier–Stokes, continuity, energy equations for an ideal gas, concentrating on time-averaged flux density in gas. results show details heat sink at one end plates couple.

10.1121/1.414992 article EN The Journal of the Acoustical Society of America 1996-06-01

Statistics and structures of pressure in three-dimensional incompressible isotropic turbulence are studied using high-resolution direct numerical simulation for Taylor microscale Reynolds numbers up to 220. It is found that the probability distribution function (PDF) has negative skewness due both kinematic dynamic effects, contrast statistics head, whose PDF almost symmetric. The statistical relations among pressure, vorticity, dissipation kinetic energy investigated conditional averaging....

10.1063/1.870085 article EN Physics of Fluids 1999-08-01

A processor core is presented for AI training and inference products. Leading-edge compute efficiency achieved robust fp16 via efficient heterogeneous 2-D systolic array-SIMD engines leveraging compact DLFloat16 FPUs. Architectural flexibility maintained very high utilization across neural network topologies. modular dual-corelet architecture with a shared scratchpad software-controlled network/memory interface enables scalability to many-core SoCs large-scale systems. The 14nm achieves peak...

10.1109/vlsicircuits18222.2020.9162917 article EN 2020-06-01

A 1.35 ns random access and 1.7 ns-random-cycle SOI embedded-DRAM macro has been developed for the POWER7™ high-performance microprocessor. The employs a 6 transistor micro sense-amplifier architecture with extended precharge scheme to enhance sensing margin product quality. detailed study shows 67% bit-line power reduction only 1.7% area overhead, while improving read zero by more than 500ps. array voltage window is improved programmable BL generator, allowing embedded DRAM operate reliably...

10.1109/jssc.2010.2084470 article EN IEEE Journal of Solid-State Circuits 2010-11-30

High-resolution direct numerical simulations of 3D Navier-Stokes turbulence with normal viscosity and hyperviscosity are carried out. It is found that the inertial-range statistics, both scalings probability density functions, independent dissipation mechanism, while near-dissipation-range fluctuations show significant structural differences. Nevertheless, relative expressing dependence moments at different orders universal, unambiguous departure from Kolmogorov 1941 description, including...

10.1103/physrevlett.76.3711 article EN Physical Review Letters 1996-05-13

This letter presents a multi-TOPS AI accelerator core for deep learning training and inference. With programmable architecture custom ISA, this engine achieves >90% sustained utilization across the range of neural network topologies by employing dataflow to provide high throughput an on-chip scratchpad hierarchy meet bandwidth demands compute units. A 16b floating point (fp16) representation with 1 sign bit, 6 exponent bits, 9 mantissa bits has also been developed model accuracy in inference...

10.1109/lssc.2019.2902738 article EN IEEE Solid-State Circuits Letters 2018-12-01

A study is made of the scaling positive part (PP) and negative (NP) velocity increments in turbulent pipe flow simulated homogeneous turbulence a box. For moment orders above unity, moments NP are larger than those PP for all separation distances, exponents NP. below absolute value increment NP, as well PP, possess which vary linearly with order q, though apparently greater $q/3$.

10.1103/physrevlett.77.1488 article EN Physical Review Letters 1996-08-19

The anomalous scaling phenomena of three-dimensional passive scalar turbulence are studied using high resolution direct numerical simulation. inertial range exponents the increment and dissipation obtained. connection between intermittency structure exponent is examined instability amplitude used to clarify previous experimental results for exponents.

10.1103/physrevlett.78.3459 article EN Physical Review Letters 1997-05-01

Reduced precision computation is a key enabling factor for energy-efficient acceleration of deep learning (DL) applications. This article presents 7-nm four-core mixed-precision artificial intelligence (AI) chip that supports four compute precisions—FP16, Hybrid-FP8 (HFP8), INT4, and INT2—to support diverse application demands training inference. The leverages cutting-edge algorithmic advances to demonstrate leading-edge power efficiency 8-bit floating-point (FP8) INT4 inference without...

10.1109/jssc.2021.3120113 article EN IEEE Journal of Solid-State Circuits 2021-11-10

This paper presents a 1.7 ns-random-cycle SOI embedded-DRAM macro developed for the POWER7¿ high-performance microprocessor and introduces enhancements to micro-sense-amplifier (¿SA) architecture. The enables 32 MB on-chip L3 cache, eliminating delay, area power from off-chip interface.

10.1109/isscc.2010.5433814 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2010-02-01

In this paper, we apply She and Leveque's [Z.-S. E. Leveque, Phys. Rev. Lett. 72, 336 (1994)] hierarchy model under the assumption that ${\mathrm{lim}}_{\mathit{p}\ensuremath{\rightarrow}\mathrm{\ensuremath{\infty}}}$${\mathrm{\ensuremath{\tau}}}_{\mathit{p}}$/p=-1 with ${\mathrm{\ensuremath{\tau}}}_{\mathit{p}}$ being scaling exponent for local averaged dissipation function suggested by Novikov [E. A. Novikov, E 50, R3303 (1994)]. The resulting agrees well existing theoretical experimental...

10.1103/physreve.52.r5757 article EN Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics 1995-12-01

The rapid emergence of AI models, specifically large language models (LLMs) requiring amounts compute, drives the need for dedicated inference hardware. During deployment, compute utilization (and thus power consumption) can vary significantly across layers an model, number tokens, precision, and batch size [1]. Such wide variation, which may occur at fast time scales, poses unique challenges in optimizing performance within system-level specifications discrete accelerator cards, including...

10.1109/isscc49657.2024.10454301 article EN 2022 IEEE International Solid- State Circuits Conference (ISSCC) 2024-02-18

Properties of velocity circulation in three-dimensional turbulence are studied using data from high-resolution direct numerical simulation Navier-Stokes equations. The probability density function (PDF) the depends on area closed contour for which is calculated, but not shape contour. For contours lying within inertial range, PDF has a Gaussian core with conspicuous exponential tails, indicating that intermittency plays an important role statistics. measured scaling exponents anomalous and...

10.1103/physrevlett.76.616 article EN Physical Review Letters 1996-01-22

High-resolution direct numerical simulation data for three-dimensional Navier-Stokes turbulence in a periodic box are used to study the scaling behavior of low-order velocity structure functions with positive and negative powers. Similar high-order statistics, relative exponents exhibit unambiguous departures from Kolmogorov 1941 theory agree well existing multiscaling models. No transition normal anomalous is observed.

10.1103/physrevlett.77.3799 article EN Physical Review Letters 1996-10-28

A phenomenological model for the inertial range scaling of passive-scalar turbulence is developed based on a bivariate log-Poisson model. An analytical formula exponent three-dimensional deduced. The predicted exponents are compared with experimental measurements, showing good agreement.

10.1063/1.869265 article EN Physics of Fluids 1997-05-01

We report A new low-swing latch (LSL) for low-power applications. Unlike the conventional transmission gate latch, LSL allows reduced voltage on clock inputs. Therefore local buffer (LCB) can use swing to save power while all other circuits are running at nominal voltage. have implemented an accumulator loop experiment in early version of IBM's 90 nm SOI technology a testchip. The consists adder and decrementer surrounded by latches mimic logic between pipeline stages. Side-by-side...

10.1109/soi.2004.1391601 article EN 2005-03-07

The combination of growth in compute capabilities and availability large datasets has led to a re-birth deep learning. Deep Neural Networks (DNNs) have become state-of-the-art variety machine learning tasks spanning domains across vision, speech, translation. Learning (DL) achieves high accuracy these at the expense 100s ExaOps computation; posing significant challenges efficient large-scale deployment both resource-constrained environments data centers.

10.1145/3218603.3241339 article EN Proceedings of the International Symposium on Low Power Electronics and Design 2018-07-23
Coming Soon ...