Grace Dinh

ORCID: 0000-0001-9626-098X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Advanced Neural Network Applications
  • Stochastic Gradient Optimization Techniques
  • Matrix Theory and Algorithms
  • Ferroelectric and Negative Capacitance Devices
  • Advanced Memory and Neural Computing
  • Advanced Data Storage Technologies
  • Brain Tumor Detection and Classification
  • Logic, programming, and type systems
  • Scheduling and Optimization Algorithms
  • Embedded Systems Design Techniques
  • Machine Learning in Materials Science
  • Tensor decomposition and applications
  • Low-power high-performance VLSI design
  • Interconnection Networks and Systems
  • VLSI and FPGA Design Techniques
  • Sparse and Compressive Sensing Techniques
  • Formal Methods in Verification
  • Numerical Methods and Algorithms
  • Quantum Computing Algorithms and Architecture

Cornell University
2025

University of California, Berkeley
2020-2024

Berkeley College
2021-2024

University of California System
2022

Recent advances in Deep Neural Networks (DNNs) have led to active development of specialized DNN accelerators, many which feature a large number processing elements laid out spatially, together with multi-level memory hierarchy and flexible interconnect. While accelerators can take advantage data reuse achieve high peak throughput, they also expose runtime parameters the programmers who need explicitly manage how computation is scheduled both spatially temporally. In fact, different...

10.1109/isca52012.2021.00050 article EN 2021-06-01

Recent advances in state-of-the-art DNN architecture design have been moving toward Transformer models. These models achieve superior accuracy across a wide range of applications. This trend has consistent over the past several years since were originally introduced. However, amount compute and bandwidth required for inference recent is growing at significant rate, this made their deployment latency-sensitive applications challenging. As such, there an increased focus on making more...

10.48550/arxiv.2302.14017 preprint EN other-oa arXiv (Cornell University) 2023-01-01

The optimization of the matrix multiplication (or GEMM) has been a need during last decades. This operation is considered flagship current linear algebra libraries such as BLIS, OpenBLAS, or Intel OneAPI because its widespread use in large variety scientific applications. GEMM usually implemented following GotoBLAS philosophy, which tiles operands and uses series nested loops for performance improvement. These approaches extract maximum computational power architectures through small pieces...

10.1109/cgo57630.2024.10444883 article EN 2024-02-28

In the hardware design space exploration process, it is critical to optimize both parameters and algorithm-to-hardware mappings. Previous work has largely approached this simultaneous optimization problem by separately exploring mapspace—both individually large highly nonconvex spaces—independently. The resulting combinatorial explosion created significant difficulties for optimizers.

10.1145/3613424.3623797 article EN cc-by 2023-10-28

Abstract Reducing communication - either between levels of a memory hierarchy or processors over network is key component performance optimization (in both time and energy) for many nested loop problems, including dense linear algebra, particle interactions, machine learning. Previous tiling based approaches these problems have been used to find lower bounds on the required execute them optimal rearrangements, blockings, attain such bounds. However, general typically assumed problem sizes...

10.1145/3350755.3400275 article EN 2020-07-06

Efficiently executing convolutional neural nets (CNNs) is important in many machine-learning tasks. Since the cost of moving a word data, either between levels memory hierarchy or processors over network, much higher than an arithmetic operation, minimizing data movement critical to performance optimization. In this paper, we present both new lower bounds on needed for CNNs, and optimal sequential algorithms that attain these bounds. most common cases, our can significantly more reuse matrix...

10.48550/arxiv.1802.06905 preprint EN other-oa arXiv (Cornell University) 2018-01-01

The standardization of an interface for dense linear algebra operations in the BLAS standard has enabled interoperability between different libraries, thereby boosting success scientific computing, particular HPC. Despite numerous efforts past, community not yet agreed on a sparse due to reasons. One is fact that objects allow many storage formats, and hardware may favor formats. This makes definition FORTRAN-style all-circumventing extremely challenging. Another reason opposed...

10.48550/arxiv.2411.13259 preprint EN arXiv (Cornell University) 2024-11-20

Convolutional neural networks (CNNs) are important in a wide variety of machine learning tasks and applications, so optimizing their performance is essential. Moving words data between levels memory hierarchy or processors on network much more expensive than the cost arithmetic, minimizing communication critical to performance. In this paper, we present new lower bounds movement for mixed precision convolutions both single-processor parallel distributed models, as well algorithms that...

10.1145/3539781.3539784 preprint EN 2022-06-27

Recent advances in Deep Neural Networks (DNNs) have led to active development of specialized DNN accelerators, many which feature a large number processing elements laid out spatially, together with multi-level memory hierarchy and flexible interconnect. While accelerators can take advantage data reuse achieve high peak throughput, they also expose runtime parameters the programmers who need explicitly manage how computation is scheduled both spatially temporally. In fact, different...

10.48550/arxiv.2105.01898 preprint EN other-oa arXiv (Cornell University) 2021-01-01

The optimization of the matrix multiplication (or GEMM) has been a need during last decades. This operation is considered flagship current linear algebra libraries such as BLIS, OpenBLAS, or Intel OneAPI because its widespread use in large variety scientific applications. GEMM usually implemented following GotoBLAS philosophy, which tiles operands and uses series nested loops for performance improvement. These approaches extract maximum computational power architectures through small pieces...

10.48550/arxiv.2310.17408 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Reducing communication - either between levels of a memory hierarchy or processors over network is key component performance optimization (in both time and energy) for many problems, including dense linear algebra, particle interactions, machine learning. For these which can be represented as nested-loop computations, previous tiling based approaches have been used to find lower bounds on the required execute them optimal rearrangements, blockings, attain such bounds. However, general...

10.48550/arxiv.2003.00119 preprint EN other-oa arXiv (Cornell University) 2020-01-01
Coming Soon ...