Devesh Tiwari

ORCID: 0000-0002-7253-2458
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Cloud Computing and Resource Management
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Distributed and Parallel Computing Systems
  • Distributed systems and fault tolerance
  • Quantum Computing Algorithms and Architecture
  • Asphalt Pavement Performance Evaluation
  • Infrastructure Maintenance and Monitoring
  • Quantum Information and Cryptography
  • Scientific Computing and Data Management
  • Radiation Effects in Electronics
  • IoT and Edge/Fog Computing
  • Software System Performance and Reliability
  • Caching and Content Delivery
  • Advanced Neural Network Applications
  • Neural Networks and Reservoir Computing
  • Topic Modeling
  • Interconnection Networks and Systems
  • Blockchain Technology Applications and Security
  • Advanced Data Processing Techniques
  • Green IT and Sustainability
  • Machine Learning in Materials Science
  • Concrete and Cement Materials Research
  • Advancements in Semiconductor Devices and Circuit Design
  • Advanced Memory and Neural Computing

Northeastern University
2017-2025

Universidad del Noreste
2017-2025

Sandia National Laboratories
2024

Boston University
2024

Graphic Era University
2023-2024

Manipal Academy of Higher Education
2024

Northeastern Illinois University
2023

Moscow Institute of Thermal Technology
2023

Central Road Research Institute
2013-2022

Academy of Scientific and Innovative Research
2013-2022

Serverless computing, an emerging computing model, relies on "warming up" functions prior to its anticipated execution for faster and cost-effective service users. Unfortunately, warming up can be inaccurate incur prohibitively expensive cost during the warmup period (i.e., keep-alive cost). In this paper, we introduce IceBreaker, a novel technique that reduces time "keep-alive" by composing system with heterogeneous nodes (costly cheaper). IceBreaker does so dynamically determining node...

10.1145/3503222.3507750 article EN 2022-02-22

Increase in graphics hardware performance and improvements programmability has enabled GPUs to evolve from a graphics-specific accelerator general-purpose computing device. Titan, the world's second fastest supercomputer for open science 2014, consists of more dum 18,000 that scientists various domains such as astrophysics, fusion, climate, combustion use routinely run large-scale simulations. Unfortunately, while efficiency is well understood, their resilience characteristics system have...

10.1109/hpca.2015.7056044 article EN 2015-02-01

Pavements are major assets of highway infrastructure. Maintenance and rehabilitation these pavements to the desired level serviceability is one challenging problems faced by pavement engineers administration in sector. The evaluation performance using condition indicators a basic component any Pavement Management System. Various like Condition Index (PCI), Present Serviceability Rating (PSR), Roughness (RI), etc. have been commonly used assign maintenance strategy for existing pavements....

10.1016/j.sbspro.2013.11.126 article EN Procedia - Social and Behavioral Sciences 2013-12-01

Modern scientific discovery is increasingly driven by large-scale supercomputing simulations, followed data analysis tasks. These analyses are either performed offline, on smaller-scale clusters, or the supercomputer itself. Unfortunately, these techniques suffer from performance and energy inefficiencies due to increased movement between compute storage subsystems. Therefore, we propose Active Flash, an insitu approach, wherein conducted solid-state device (SSD), where already resides. Our...

10.5555/2591272.2591286 article EN File and Storage Technologies 2013-02-12

Resilience is one of the key challenges in maintaining high efficiency future extreme scale supercomputers. Researchers and system practitioners rely on field-data studies to understand reliability characteristics plan for HPC systems. In this work, we compare contrast multiple large-scale production Our study covers more than billion compute node hours across five different systems over a period 8 years. We confirm previous findings which continue be valid, discover new findings, discuss...

10.1145/3126908.3126937 article EN 2017-11-08

Large-scale data centers run latency-critical jobs with quality-of-service (QoS) requirements, and throughput-oriented background jobs, which need to achieve high perfor-mance. Previous works have proposed methods cannot co-locate multiple back-grounds while: (1) meeting the QoS requirements of all (2) maximizing performance jobs. This paper proposes CLITE, a Bayesian Optimization-based, multi-resource partitioning technique achieves these goals. CLITE is publicly available at...

10.1109/hpca47549.2020.00025 article EN 2020-02-01

Large language models (LLMs) have exploded in popularity due to their new generative capabilities that go far beyond prior state-of-the-art. These technologies are increasingly being leveraged various domains such as law, finance, and medicine. However, these carry significant computational challenges, especially the compute energy costs required for inference. Inference already receive less attention than of training LLMs-despite how often large called on conduct inference reality (e.g.,...

10.1109/hpec58863.2023.10363447 article EN 2023-09-25

Serverless computing has grown rapidly as a new cloud paradigm that promises ease-of-management, cost-efficiency, and auto-scaling by shipping functions via self-contained virtualized containers. Unfortunately, serverless suffers from severe cold-start problems---starting containers incurs non-trivial latency. Full container caching is widely applied to mitigate cold-starts, yet recently been outperformed two lines of research: partial sharing. However, either or sharing techniques exhibit...

10.1145/3617232.3624871 article EN other-oa 2024-04-17

Continuing increase in the computational power of supercomputers has enabled large-scale scientific applications areas astrophysics, fusion, climate and combustion to run larger longer-running simulations, facilitating deeper insights. However, these long-running simulations are often interrupted by multiple system failures. Therefore, rely on "check pointing'" as a resilience mechanism store application state permanent storage recover from Unfortunately, check pointing incurs excessive I/O...

10.1109/dsn.2014.101 article EN 2014-06-01

The high computational capability of graphics processing units (GPUs) is enabling and driving the scientific discovery process at large-scale. world's second fastest supercomputer for open science, Titan, has more than 18,000 GPUs that scientists use to perform simulations data analysis. Understanding GPU reliability characteristics, however, still in its nascent stage since have only recently been deployed This paper presents a detailed study errors their impact on system operations...

10.1145/2807591.2807666 article EN 2015-10-27

Parallelism provided by the GPU architecture has enabled domain scientists to simulate physical phenomena at a much faster rate and finer granularity than what was previously possible CPU-based large-scale clusters. Architecture researchers have been investigating reliability characteristics of GPUs innovating techniques increase these emerging computing devices. Such efforts are often guided technology projections simplistic scientific kernels, performed using architectural simulators...

10.1109/hpca.2016.7446091 article EN 2016-03-01

GPUs are widely deployed on large-scale HPC systems to provide powerful computational capability for scientific applications from various domains. As those normally long-running, investigating the characteristics of GPU errors becomes imperative reliability. In this paper, we first study system conditions that trigger using six-month trace data collected a large-scale, operational system. Then, use machine learning predict occurrence errors, by taking advantage temporal and spatial...

10.1109/dsn.2018.00022 article EN 2018-06-01

As we approach exascale, the scientific simulations are expected to experience more interruptions due increased system failures. Designing better HPC resilience techniques requires understanding key characteristics of failures on these systems. While temporal properties systems have been well-investigated, there is limited about spatial and its impact mechanisms. Therefore, examine behavior We investigate interaction between implications for operations mechanisms large-scale show that...

10.1109/dsn.2015.52 article EN 2015-06-01

Cloud platforms typically require users to provide resource requirements for applications so that managers can schedule containers with adequate allocations. However, the container resources often depend on numerous factors such as application input parameters, optimization flags, files, and attributes are specified each run. So, it is complex estimate a given accurately, leading over-estimation negatively affects overall utilization. We have designed Resource Utilization Based Autoscaling...

10.1109/cloud.2019.00018 article EN 2019-07-01

We present QUEST, a procedure to systematically generate approximations for quantum circuits reduce their CNOT gate count. Our approach employs circuit partitioning scalability with procedures 1) length using approximate synthesis, 2) improve fidelity by running that represent key samples in the approximation space, and 3) reason about upper bound. evaluation results indicate our of "dissimilar" provides close original circuit. Overall, QUEST can count 30-80% on ideal systems decrease impact...

10.1145/3503222.3507739 article EN 2022-02-22

This work introduces Mashup, a novel strategy to leverage serverless computing model for executing scientific workflows in hybrid fashion by taking advantage of both the traditional VM-based cloud platform and emerging platform. Mashup outperforms state-of-the-art workflow execution engines an average 34% 43% terms time reduction cost reduction, respectively, widely-used HPC on Amazon Cloud (EC2 Lambda).

10.1145/3503221.3508407 article EN 2022-03-28

The energy requirements of current natural language processing models continue to grow at a rapid, unsustainable pace. Recent works highlighting this problem conclude there is an urgent need for methods that reduce the needs NLP and machine learning more broadly. In article, we investigate techniques can be used consumption common applications. particular, focus on measure usage different hardware datacenter-oriented settings tuned training inference models. We characterize impact these...

10.18653/v1/2022.findings-naacl.151 article EN cc-by Findings of the Association for Computational Linguistics: NAACL 2022 2022-01-01

The rapid growth in demand for HPC systems has led to a rise carbon footprint, which requires urgent intervention. In this work, we present comprehensive analysis of the footprint high-performance computing (HPC) systems, considering during both hardware manufacturing and system operational stages. Our work employs component modeling, regional intensity analysis, experimental characterization life cycle highlight importance quantifying systems.

10.1145/3581784.3607035 preprint EN 2023-11-11

GPUs have become part of the mainstream high performance computing facilities that increasingly require more computational power to simulate physical phenomena quickly and accurately. However, GPU nodes also consume significantly than traditional CPU nodes, consumption introduces new system operation challenges, including increased temperature, power/cooling cost, lower reliability. This paper explores how temperature characteristics affect reliability, provides insights into what are...

10.1109/mascots.2017.12 article EN 2017-09-01

Current Noisy Intermediate-Scale Quantum (NISQ) computers are useful in developing the quantum computing stack, test algorithms, and establish feasibility of computing. However, different statistically significant errors permeate NISQ computers. To reduce effect these errors, recent research has focused on effective mapping a algorithm to computer an error-and-constraints-aware manner. We propose first work, QRAFT, leverage reversibility property algorithms considerably error beyond...

10.1145/3445814.3446743 article EN 2021-04-11

GPU technology has been improving at an expedited pace in terms of size and performance, empowering HPC AI/ML researchers to advance the scientific discovery process. However, this also leads inefficient resource usage, as most workloads, including complicated models, are not able utilize resources their fullest extent - encouraging support for multi-tenancy. We propose MISO, a technique exploit Multi-Instance (MIG) capability on latest NVIDIA datacenter GPUs (e.g., A100, H100) dynamically...

10.1145/3542929.3563510 preprint EN 2022-11-07

This paper presents a solution to the challenge of mitigating carbon emissions from hosting large-scale machine learning (ML) inference services. ML is critical modern technology products, but it also significant contributor footprint. We introduce Clover, carbon-friendly service runtime system that balances performance, accuracy, and through mixed-quality models GPU resource partitioning. Our experimental results demonstrate Clover effective in substantially reducing while maintaining high...

10.1145/3581784.3607034 preprint EN 2023-11-11

The rapid advancement of Generative Artificial Intelligence (GenAI) across diverse sectors raises significant environmental concerns, notably the carbon emissions from their cloud and high performance computing (HPC) infrastructure. This paper presents Sprout, an innovative framework designed to address these concerns by reducing footprint generative Large Language Model (LLM) inference services. Sprout leverages concept "generation directives" guide autoregressive generation process,...

10.48550/arxiv.2403.12900 preprint EN arXiv (Cornell University) 2024-03-19

The carbon and water footprint of large-scale computing systems poses serious environmental sustainability risks. In this study, we discover that, unfortunately, are at odds with each other - and, optimizing one alone hurts the other. Toward that goal, introduce, WaterWise, a novel job scheduler for parallel workloads intelligently co-optimizes to improve geographically distributed data centers.

10.48550/arxiv.2501.17944 preprint EN arXiv (Cornell University) 2025-01-29
Coming Soon ...