Srikant Bharadwaj

ORCID: 0000-0002-0422-5210
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Interconnection Networks and Systems
  • Superconducting Materials and Applications
  • Low-power high-performance VLSI design
  • Advanced Data Storage Technologies
  • 3D IC and TSV technologies
  • Embedded Systems Design Techniques
  • VLSI and FPGA Design Techniques
  • Quantum Computing Algorithms and Architecture
  • VLSI and Analog Circuit Testing
  • Advancements in Semiconductor Devices and Circuit Design
  • Manufacturing Process and Optimization
  • Iterative Learning Control Systems
  • Stochastic Gradient Optimization Techniques
  • Cloud Computing and Resource Management
  • Radiation Effects in Electronics
  • Graph theory and applications
  • Quantum Information and Cryptography
  • Network Packet Processing and Optimization
  • Advanced Surface Polishing Techniques
  • Semiconductor materials and devices
  • Ferroelectric and Negative Capacitance Devices
  • Quantum-Dot Cellular Automata
  • Advanced Memory and Neural Computing
  • Industrial Vision Systems and Defect Detection

Microsoft (United States)
2023-2024

Bellevue Hospital Center
2021-2023

Birla Institute of Technology and Science, Pilani
2022

Advanced Micro Devices (Canada)
2020-2021

Georgia Institute of Technology
2018-2021

Recent advances in die-stacking and 2.5D chip integration technologies introduce in-package network heterogeneities that can complicate the interconnect design. Integrating chiplets over a silicon interposer offers new opportunities of optimizing topologies. However, limited by capability existing network-on-chip (NoC) simulators, full potential interposer-based NoCs has not been exploited. In this paper, we address shortfalls prior NoC designs present family chiplet topologies called Kite....

10.1109/dac18072.2020.9218539 article EN 2020-07-01

The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern hardware at cycle level, it has enough fidelity boot unmodified Linux-based operating systems run full applications multiple architectures including x86, Arm, RISC-V. been under active development over last nine years since original release. In this time, there have 7500 commits codebase from 250 unique...

10.48550/arxiv.2007.03152 preprint EN cc-by arXiv (Cornell University) 2020-01-01

Approximate computing is an evolving paradigm that aims to improve the power, speed, and area in neural network applications can tolerate errors up a specific limit. This letter proposes new multiplier architecture based on algorithm adapts approximate compressor from existing proposed compressors' set reduce error respective partial product columns. Further, due approximation corrected using simple error-correcting module. Results prove power power–delay (PDP) of 8-bit improved by 39.9%...

10.1109/les.2022.3199273 article EN IEEE Embedded Systems Letters 2022-08-15

Recent studies have shown the potential of last-level TLBs shared by multiple cores in tackling memory translation performance challenges posed "big data" workloads. A key stumbling block hindering their effectiveness, however, is high access time. We present a design methodology to reduce these times so as realize high-performance and scalable L2 TLBs. As first step, we study benefits replacing monolithic with distributed set small TLB slices. While this approach does lookup latency, it...

10.1109/micro.2018.00030 article EN 2018-10-01

Variational quantum algorithms (VQAs) provide a promising approach to achieve advantage in the noisy intermediate-scale era. In this era, computers experience high error rates and detection correction is not feasible. VQAs can utilize qubits tandem with classical optimization solve hard problems. However, are still slow relative their counterparts. Hence, improving performance of will be necessary make them competitive. While expected perform better as problem sizes increase, increasing...

10.48550/arxiv.2109.01714 preprint EN other-oa arXiv (Cornell University) 2021-01-01

With the continuous improvement of on-chip integrated voltage regulators (IVRs) and fast, adaptive frequency control, dynamic voltage-frequency scaling (DVFS) transition times have shrunk from microsecond to nanosecond regime, providing immense opportunity improve energy efficiency. The key unlocking continued in V/f circuit technology is creation new, smarter DVFS mechanisms that better adapt rapid fluctuations workload demand.

10.1145/3623278.3624756 article EN 2023-03-25

In the 1990s, tech giants, Intel and ARM, were oriented towards very different goals. Intel's x86 concentrated on peak performance for PCs servers, while ARM focused increasing energy efficiency mainly mobile devices wearables. But today, after three decades, with a massive ad-vancement in technology, there is need to work similar has started manufacturing processors handheld devices, begun delving into servers. Since both ARM-based are now competitors, it essential compare their variety of...

10.1109/ises54909.2022.00128 article EN 2021 IEEE International Symposium on Smart Electronic Systems (iSES) 2022-12-01

In recent years, machine intelligence (MI) applications have emerged as a major driver for the computing industry. Optimizing these workloads is important, but complicated. As memory demands grow and data movement overheads increasingly limit performance, determining best GPU caching policy to use diverse range of MI represents one important challenge. To study this, we evaluate 17 characterize their behavior using strategies. our evaluations, find that choice in caches involves multiple...

10.1109/iiswc47752.2019.9041977 article EN 2019-11-01

The performance of graphics processing units (GPU) workloads can be sensitive to the various clock domains which are dynamically tunable in modern GPUs. In this work, we observe that GPU application is towards NoC frequencies and sensitivity varies during execution kernels. We note heterogeneity not adapted well by traditional dynamic voltage frequency scaling (DVFS) techniques. To end, introduce DUB, <u>D</u>ynamic <u>U</u>nderclocking <u>B</u>ypassing technique, for such heterogeneous...

10.1145/3479876.3481590 article EN 2021-10-05

Transformer-based models have emerged as one of the most widely used architectures for natural language processing, generation, and image generation. The size state-of-the-art has increased steadily reaching billions parameters. These huge are memory hungry incur significant inference latency even on cutting edge AI-accelerators, such GPUs. Specifically, time complexity attention operation is quadratic in terms total context length, i.e., prompt output tokens. Thus, several optimizations...

10.48550/arxiv.2405.10480 preprint EN arXiv (Cornell University) 2024-05-16

Large language model (LLM) inference demands significant amount of computation and memory, especially in the key attention mechanism. While techniques, such as quantization acceleration algorithms, like FlashAttention, have improved efficiency overall inference, they address different aspects problem: focuses on weight-activation operations, while FlashAttention improves execution but requires high-precision formats. Recent Key-value (KV) cache reduces memory bandwidth still needs...

10.48550/arxiv.2412.08585 preprint EN arXiv (Cornell University) 2024-12-11

With the continuous improvement of on-chip integrated voltage regulators (IVRs) and fast, adaptive frequency control, dynamic voltage-frequency scaling (DVFS) transition times have shrunk from microsecond to nanosecond regime, providing additional opportunities improve energy efficiency. The key unlocking continued in circuit technology is creation new, smarter DVFS mechanisms that better adapt rapid fluctuations workload demand. It particularly important optimize fine-grain for graphics...

10.48550/arxiv.2205.00121 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01
Coming Soon ...