- Fluid Dynamics and Turbulent Flows
- Wind and Air Flow Studies
- Low-power high-performance VLSI design
- Meteorological Phenomena and Simulations
- Parallel Computing and Optimization Techniques
- Advanced Neural Network Applications
- Ferroelectric and Negative Capacitance Devices
- Advanced Memory and Neural Computing
- Electromagnetic Compatibility and Noise Suppression
- Advancements in Semiconductor Devices and Circuit Design
- Electrostatic Discharge in Electronics
- Particle Dynamics in Fluid Flows
- Plant Water Relations and Carbon Dynamics
- Advancements in PLL and VCO Technologies
- VLSI and Analog Circuit Testing
- Photonic and Optical Devices
- Electromagnetic Scattering and Analysis
- Nonlinear Photonic Systems
- Complex Systems and Time Series Analysis
- Machine Learning and Data Classification
- Domain Adaptation and Few-Shot Learning
- Advanced Thermodynamic Systems and Engines
- Rheology and Fluid Dynamics Studies
- Fluid Dynamics and Vibration Analysis
- Aerosol Filtration and Electrostatic Precipitation
IBM Research - Thomas J. Watson Research Center
1995-2024
IBM (United States)
2005-2024
W. L. Gore & Associates (United States)
2003
Los Alamos National Laboratory
1995-1999
City College of New York
1993
Peking University
1989
The lattice Boltzmann method (LBM) is regarded as a specific finite difference discretization for the kinetic equation of discrete velocity distribution function. We argue that sets models, such LBM, physical symmetry necessary obtaining correct macroscopic Navier-Stokes equations. In contrast, and Lagrangian nature scheme, which often used in gas automaton existing methods directly associated with property particle dynamics, not recovering dynamics. By relaxing constraint introducing other...
A multi-TOPS AI core is presented for acceleration of deep learning training and inference in systems from edge devices to data centers. With a programmable architecture custom ISA, this engine achieves >90% sustained utilization across the range neural network topologies by employing dataflow an on-chip scratchpad hierarchy. Compute precision optimized at 16b floating point (fp 16) high model accuracy as well 1b/2b (bi-nary/ternary) integer aggressive performance. At 1.5 GHz, prototype...
Low-precision computation is the key enabling factor to achieve high compute densities (T0PS/W and T0PS/mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) in AI hardware accelerators across cloud edge platforms. However, robust deep learning (DL) model accuracy equivalent high-precision must be maintained. Improvements bandwidth, architecture, power management are also required harness benefit of reduced precision by feeding supporting...
Advances in deep neural networks (DNNs) and the availability of massive real-world data have enabled superhuman levels accuracy on many AI tasks ushered explosive growth workloads across spectrum computing devices. However, their superior comes at a high computational cost, which necessitates approaches beyond traditional paradigms to improve operational efficiency. Leveraging application-level insight error resilience, we demonstrate how approximate (AxC) can significantly boost efficiency...
The growing prevalence and computational demands of Artificial Intelligence (AI) workloads has led to widespread use hardware accelerators in their execution. Scaling the performance AI across generations is pivotal success commercial deployments. intrinsic error-resilient nature present a unique opportunity for performance/energy improvement through precision scaling. Motivated by recent algorithmic advances scaling inference training, we designed RaPiD <sup...
We argue on the basis of empirical data that Kolmogorov's refined similarity hypothesis (RSH) needs to be modified for transverse velocity increments, and propose an alternative. In this new form, increments bear same relation locally averaged enstrophy (squared vorticity) as longitudinal in RSH dissipation. support by analyzing high-resolution numerical simulation isotropic turbulence. its proposed modification appear represent two independent scaling groups.
The hydro- and thermodynamical processes near within a thermoacoustic couple are simulated analyzed by numerical solution of the compressible Navier–Stokes, continuity, energy equations for an ideal gas, concentrating on time-averaged flux density in gas. results show details heat sink at one end plates couple.
Statistics and structures of pressure in three-dimensional incompressible isotropic turbulence are studied using high-resolution direct numerical simulation for Taylor microscale Reynolds numbers up to 220. It is found that the probability distribution function (PDF) has negative skewness due both kinematic dynamic effects, contrast statistics head, whose PDF almost symmetric. The statistical relations among pressure, vorticity, dissipation kinetic energy investigated conditional averaging....
A processor core is presented for AI training and inference products. Leading-edge compute efficiency achieved robust fp16 via efficient heterogeneous 2-D systolic array-SIMD engines leveraging compact DLFloat16 FPUs. Architectural flexibility maintained very high utilization across neural network topologies. modular dual-corelet architecture with a shared scratchpad software-controlled network/memory interface enables scalability to many-core SoCs large-scale systems. The 14nm achieves peak...
A 1.35 ns random access and 1.7 ns-random-cycle SOI embedded-DRAM macro has been developed for the POWER7™ high-performance microprocessor. The employs a 6 transistor micro sense-amplifier architecture with extended precharge scheme to enhance sensing margin product quality. detailed study shows 67% bit-line power reduction only 1.7% area overhead, while improving read zero by more than 500ps. array voltage window is improved programmable BL generator, allowing embedded DRAM operate reliably...
High-resolution direct numerical simulations of 3D Navier-Stokes turbulence with normal viscosity and hyperviscosity are carried out. It is found that the inertial-range statistics, both scalings probability density functions, independent dissipation mechanism, while near-dissipation-range fluctuations show significant structural differences. Nevertheless, relative expressing dependence moments at different orders universal, unambiguous departure from Kolmogorov 1941 description, including...
This letter presents a multi-TOPS AI accelerator core for deep learning training and inference. With programmable architecture custom ISA, this engine achieves >90% sustained utilization across the range of neural network topologies by employing dataflow to provide high throughput an on-chip scratchpad hierarchy meet bandwidth demands compute units. A 16b floating point (fp16) representation with 1 sign bit, 6 exponent bits, 9 mantissa bits has also been developed model accuracy in inference...
A study is made of the scaling positive part (PP) and negative (NP) velocity increments in turbulent pipe flow simulated homogeneous turbulence a box. For moment orders above unity, moments NP are larger than those PP for all separation distances, exponents NP. below absolute value increment NP, as well PP, possess which vary linearly with order q, though apparently greater $q/3$.
The anomalous scaling phenomena of three-dimensional passive scalar turbulence are studied using high resolution direct numerical simulation. inertial range exponents the increment and dissipation obtained. connection between intermittency structure exponent is examined instability amplitude used to clarify previous experimental results for exponents.
Reduced precision computation is a key enabling factor for energy-efficient acceleration of deep learning (DL) applications. This article presents 7-nm four-core mixed-precision artificial intelligence (AI) chip that supports four compute precisions—FP16, Hybrid-FP8 (HFP8), INT4, and INT2—to support diverse application demands training inference. The leverages cutting-edge algorithmic advances to demonstrate leading-edge power efficiency 8-bit floating-point (FP8) INT4 inference without...
This paper presents a 1.7 ns-random-cycle SOI embedded-DRAM macro developed for the POWER7¿ high-performance microprocessor and introduces enhancements to micro-sense-amplifier (¿SA) architecture. The enables 32 MB on-chip L3 cache, eliminating delay, area power from off-chip interface.
In this paper, we apply She and Leveque's [Z.-S. E. Leveque, Phys. Rev. Lett. 72, 336 (1994)] hierarchy model under the assumption that ${\mathrm{lim}}_{\mathit{p}\ensuremath{\rightarrow}\mathrm{\ensuremath{\infty}}}$${\mathrm{\ensuremath{\tau}}}_{\mathit{p}}$/p=-1 with ${\mathrm{\ensuremath{\tau}}}_{\mathit{p}}$ being scaling exponent for local averaged dissipation function suggested by Novikov [E. A. Novikov, E 50, R3303 (1994)]. The resulting agrees well existing theoretical experimental...
The rapid emergence of AI models, specifically large language models (LLMs) requiring amounts compute, drives the need for dedicated inference hardware. During deployment, compute utilization (and thus power consumption) can vary significantly across layers an model, number tokens, precision, and batch size [1]. Such wide variation, which may occur at fast time scales, poses unique challenges in optimizing performance within system-level specifications discrete accelerator cards, including...
Properties of velocity circulation in three-dimensional turbulence are studied using data from high-resolution direct numerical simulation Navier-Stokes equations. The probability density function (PDF) the depends on area closed contour for which is calculated, but not shape contour. For contours lying within inertial range, PDF has a Gaussian core with conspicuous exponential tails, indicating that intermittency plays an important role statistics. measured scaling exponents anomalous and...
High-resolution direct numerical simulation data for three-dimensional Navier-Stokes turbulence in a periodic box are used to study the scaling behavior of low-order velocity structure functions with positive and negative powers. Similar high-order statistics, relative exponents exhibit unambiguous departures from Kolmogorov 1941 theory agree well existing multiscaling models. No transition normal anomalous is observed.
A phenomenological model for the inertial range scaling of passive-scalar turbulence is developed based on a bivariate log-Poisson model. An analytical formula exponent three-dimensional deduced. The predicted exponents are compared with experimental measurements, showing good agreement.
We report A new low-swing latch (LSL) for low-power applications. Unlike the conventional transmission gate latch, LSL allows reduced voltage on clock inputs. Therefore local buffer (LCB) can use swing to save power while all other circuits are running at nominal voltage. have implemented an accumulator loop experiment in early version of IBM's 90 nm SOI technology a testchip. The consists adder and decrementer surrounded by latches mimic logic between pipeline stages. Side-by-side...
The combination of growth in compute capabilities and availability large datasets has led to a re-birth deep learning. Deep Neural Networks (DNNs) have become state-of-the-art variety machine learning tasks spanning domains across vision, speech, translation. Learning (DL) achieves high accuracy these at the expense 100s ExaOps computation; posing significant challenges efficient large-scale deployment both resource-constrained environments data centers.