NFDI4DS | UHH-SEMS - Publication Details

Vijay Gadepally

ORCID: 0000-0002-4598-2808

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5043450560

Research Areas

Cloud Computing and Resource Management
Advanced Data Storage Technologies
Distributed and Parallel Computing Systems
Complex Network Analysis Techniques
Parallel Computing and Optimization Techniques
Graph Theory and Algorithms
Advanced Database Systems and Queries
Scientific Computing and Data Management
Advanced Neural Network Applications
Network Security and Intrusion Detection
Data Management and Algorithms
Data Quality and Management
Advanced Graph Neural Networks
Anomaly Detection Techniques and Applications
Machine Learning in Materials Science
Data Visualization and Analytics
Cryptography and Data Security
Advanced Image and Video Retrieval Techniques
Peer-to-Peer Network Technologies
Multimodal Machine Learning Applications
Privacy-Preserving Technologies in Data
Advanced Memory and Neural Computing
Internet Traffic Analysis and Secure E-voting
Topic Modeling
Adversarial Robustness in Machine Learning

Massachusetts Institute of Technology
2016-2025

MIT Lincoln Laboratory
2015-2024

Moscow Institute of Thermal Technology
2015-2024

Stanford Medicine
2021

IIT@MIT
2020

The Ohio State University
2011-2013

Ohio Supercomputer Center
2007-2009

Survey and Benchmarking of Machine Learning Accelerators

OPENALEX - Publications

Albert Reuther Peter Michaleas Michael Jones Vijay Gadepally Siddharth Samsi and 1 more

Advances in multicore processors and accelerators have opened the flood gates to greater exploration application of machine learning techniques a variety applications. These advances, along with breakdowns several trends including Moore's Law, prompted an explosion that promise even computational capabilities. are coming many forms, from CPUs GPUs ASICs, FPGAs, dataflow accelerators. This paper surveys current state these been publicly announced performance power consumption numbers. The...

10.1109/hpec.2019.8916327 preprint EN 2019-09-01

Neural scaling of deep chemical models

OPENALEX - Publications

Nathan C. Frey Ryan Soklaski Simon Axelrod Siddharth Samsi Rafael Gómez‐Bombarelli and 2 more

Abstract Massive scale, in terms of both data availability and computation, enables important breakthroughs key application areas deep learning such as natural language processing computer vision. There is emerging evidence that scale may be a ingredient scientific learning, but the importance physical priors domains makes strategies benefits scaling uncertain. Here we investigate neural-scaling behaviour large chemical models by varying model dataset sizes over many orders magnitude,...

10.1038/s42256-023-00740-3 article EN cc-by Nature Machine Intelligence 2023-10-23

Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis

OPENALEX - Publications

Albert Reuther Jeremy Kepner Chansup Byun Siddharth Samsi William Arcand and 13 more

Interactive massively parallel computations are critical for machine learning and data analysis. These a staple of the MIT Lincoln Laboratory Supercomputing Center (LLSC) has required LLSC to develop unique interactive supercomputing capabilities. Scaling frameworks, such as TensorFlow, analysis environments, MATLAB/Octave, tens thousands cores presents many technical challenges - in particular, rapidly dispatching tasks through scheduler, Slurm, starting instances applications with...

10.1109/hpec.2018.8547629 preprint EN 2018-09-01

A Framework for Estimating Driver Decisions Near Intersections

OPENALEX - Publications

Vijay Gadepally Ashok Krishnamurthy Ümi̇t Özgüner

We present a framework for the estimation of driver behavior at intersections, with applications to autonomous driving and vehicle safety. The is based on modeling dynamics as hybrid-state system (HSS), decisions being modeled discrete-state continuous-state system. proposed method uses observable parameters track instantaneous continuous state estimates most likely given these observations. This paper describes that encompasses hybrid structure vehicle-driver coupling hidden Markov models...

10.1109/tits.2013.2285159 article EN IEEE Transactions on Intelligent Transportation Systems 2013-11-19

A demonstration of the BigDAWG polystore system

OPENALEX - Publications

Aaron J. Elmore Jennie Duggan Michael Stonebraker Magda Balazinska Ug̃ur Çetintemel and 13 more

This paper presents BigDAWG, a reference implementation of new architecture for "Big Data" applications. Such applications not only call large-scale analytics, but also real-time streaming support, smaller analytics at interactive speeds, data visualization, and cross-storage-system queries. Guided by the principle that "one size does fit all", we build on top variety storage engines, each designed specialized use case. To illustrate promise this approach, demonstrate its effectiveness...

10.14778/2824032.2824098 article EN Proceedings of the VLDB Endowment 2015-08-01

SoK: Cryptographically Protected Database Search

OPENALEX - Publications

Benjamin Fuller Mayank Varia Arkady Yerukhimovich Emily Shen Ariel Hamlin and 4 more

Protected database search systems cryptographically isolate the roles of reading from, writing to, and administering database. This separation limits unnecessary administrator access protects data in case system breaches. Since protected was introduced 2000, area has grown rapidly, are offered by academia, start-ups, established companies. However, there is no best or set techniques. Design such a balancing act between security, functionality, performance, usability. challenge made more...

10.1109/sp.2017.10 article EN 2022 IEEE Symposium on Security and Privacy (SP) 2017-05-01

Enabling query processing across heterogeneous data models: A survey

OPENALEX - Publications

Ran Tan Rada Chirkova Vijay Gadepally Timothy G. Mattson

Modern applications often need to manage and analyze widely diverse datasets that span multiple data models [1], [2], [3], [4], [5]. Warehousing the through Extract-Transform-Load (ETL) processes can be expensive in such scenarios. Transforming disparate into a single model may degrade performance. Further, curating maintaining pipeline prove labor intensive. As result, an emerging trend is shift focus federating specialized stores enabling query processing across heterogeneous [6]. This...

10.1109/bigdata.2017.8258302 article EN 2021 IEEE International Conference on Big Data (Big Data) 2017-12-01

AI and ML Accelerator Survey and Trends

OPENALEX - Publications

Albert Reuther Peter Michaleas Michael Jones Vijay Gadepally Siddharth Samsi and 1 more

This paper updates the survey of AI accelerators and processors from past three years. collects summarizes current commercial that have been publicly announced with peak performance power consumption numbers. The values are plotted on a scatter graph, number dimensions observations trends this plot again discussed analyzed. Two new plots based accelerator release dates included in year's paper, along additional some neuromorphic, photonic, memristor-based inference accelerators.

10.1109/hpec55821.2022.9926331 preprint EN 2022-09-19

From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference

OPENALEX - Publications

Siddharth Samsi Dan Zhao J.C. McDonald Baolin Li Adam Michaleas and 5 more

Large language models (LLMs) have exploded in popularity due to their new generative capabilities that go far beyond prior state-of-the-art. These technologies are increasingly being leveraged various domains such as law, finance, and medicine. However, these carry significant computational challenges, especially the compute energy costs required for inference. Inference already receive less attention than of training LLMs-despite how often large called on conduct inference reality (e.g.,...

10.1109/hpec58863.2023.10363447 article EN 2023-09-25

Computing on masked data: a high performance method for improving big data veracity

OPENALEX - Publications

Jeremy Kepner Vijay Gadepally Pete Michaleas Nabíl Schear Mayank Varia and 2 more

The growing gap between data and users calls for innovative tools that address the challenges faced by big volume, velocity variety. Along with these standard three V's of data, an emerging fourth "V" is veracity, which addresses confidentiality, integrity, availability data. Traditional cryptographic techniques ensure veracity can have overheads are too large to apply This work introduces a new technique called Computing on Masked Data (CMD), improves allowing computations be performed...

10.1109/hpec.2014.7040946 preprint EN 2014-09-01

Static graph challenge: Subgraph isomorphism

OPENALEX - Publications

Siddharth Samsi Vijay Gadepally M. Hurley Michael Jones Edward K. Kao and 7 more

The rise of graph analytic systems has created a need for ways to measure and compare the capabilities these systems. Graph analytics present unique scalability difficulties. machine learning, high performance computing, visual communities have wrestled with difficulties decades developed methodologies creating challenges move forward. proposed Subgraph Isomorphism Challenge draws upon prior from create challenge that is reflective many real-world processing holistic specification multiple...

10.1109/hpec.2017.8091039 preprint EN 2017-09-01

The BigDAWG polystore system and architecture

OPENALEX - Publications

Vijay Gadepally Peinan Chen Jennie Duggan Aaron J. Elmore Brandon Haynes and 4 more

Organizations are often faced with the challenge of providing data management solutions for large, heterogenous datasets that may have different underlying and programming models. For example, a medical dataset unstructured text, relational data, time series waveforms imagery. Trying to fit such in single system can adverse performance efficiency effects. As part Intel Science Technology Center on Big Data, we developing polystore designed problems. BigDAWG (short Data Analytics Working...

10.1109/hpec.2016.7761636 preprint EN 2016-09-01

D4M: Bringing associative arrays to database engines

OPENALEX - Publications

Vijay Gadepally Jeremy Kepner William Arcand David Bestor Bill Bergeron and 9 more

The ability to collect and analyze large amounts of data is a growing problem within the scientific community. gap between users calls for innovative tools that address challenges faced by big volume, velocity variety. Numerous exist allow store, query index these massive quantities data. Each storage or database engine comes with promise dealing complex Scientists engineers who wish use systems often quickly find there no single technology offers panacea complexity information. When using...

10.1109/hpec.2015.7322472 preprint EN 2015-09-01

Great Power, Great Responsibility: Recommendations for Reducing Energy for Training Language Models

OPENALEX - Publications

J.C. McDonald Baolin Li Nathan C. Frey Devesh Tiwari Vijay Gadepally and 1 more

The energy requirements of current natural language processing models continue to grow at a rapid, unsustainable pace. Recent works highlighting this problem conclude there is an urgent need for methods that reduce the needs NLP and machine learning more broadly. In article, we investigate techniques can be used consumption common applications. particular, focus on measure usage different hardware datacenter-oriented settings tuned training inference models. We characterize impact these...

10.18653/v1/2022.findings-naacl.151 article EN cc-by Findings of the Association for Computational Linguistics: NAACL 2022 2022-01-01

Mashup

OPENALEX - Publications

Rohan Basu Roy Tirthak Patel Vijay Gadepally Devesh Tiwari

This work introduces Mashup, a novel strategy to leverage serverless computing model for executing scientific workflows in hybrid fashion by taking advantage of both the traditional VM-based cloud platform and emerging platform. Mashup outperforms state-of-the-art workflow execution engines an average 34% 43% terms time reduction cost reduction, respectively, widely-used HPC on Amazon Cloud (EC2 Lambda).

10.1145/3503221.3508407 article EN 2022-03-28

Toward Sustainable HPC: Carbon Footprint Estimation and Environmental Implications of HPC Systems

OPENALEX - Publications

Baolin Li Rohan Basu Roy D. Wang Siddharth Samsi Vijay Gadepally and 1 more

The rapid growth in demand for HPC systems has led to a rise carbon footprint, which requires urgent intervention. In this work, we present comprehensive analysis of the footprint high-performance computing (HPC) systems, considering during both hardware manufacturing and system operational stages. Our work employs component modeling, regional intensity analysis, experimental characterization life cycle highlight importance quantifying systems.

10.1145/3581784.3607035 preprint EN 2023-11-11

A survey of cryptographic approaches to securing big-data analytics in the cloud

OPENALEX - Publications

Sophia Yakoubov Vijay Gadepally Nabíl Schear Emily Shen Arkady Yerukhimovich

The growing demand for cloud computing motivates the need to study security of data received, stored, processed, and transmitted by a cloud. In this paper, we present framework such study. We introduce model that captures rich class big-data use-cases allows reasoning about relevant threats goals. then survey three cryptographic techniques - homomorphic encryption, verifiable computation, multi-party computation can be used achieve these describe in context our highlight differences...

10.1109/hpec.2014.7040943 article EN 2014-09-01

Achieving 100,000,000 database inserts per second using Accumulo and D4M

OPENALEX - Publications

Jeremy Kepner William Arcand David Bestor Bill Bergeron Chansup Byun and 8 more

The Apache Accumulo database is an open source relaxed consistency that widely used for government applications. designed to deliver high performance on unstructured data such as graphs of network data. This paper tests the using from Graph500 benchmark. Dynamic Distributed Dimensional Data Model (D4M) software implement benchmark a 216-node cluster running MIT SuperCloud stack. A peak over 100,000,000 inserts per second was achieved which 100x larger than highest previously published value...

10.1109/hpec.2014.7040945 preprint EN 2014-09-01

Improving Big Data visual analytics with interactive virtual reality

OPENALEX - Publications

Andrew Moran Vijay Gadepally Matthew Hubbell Jeremy Kepner

For decades, the growth and volume of digital data collection has made it challenging to digest large volumes information extract underlying structure. Coined `Big Data', massive amounts quite often been gathered inconsistently (e.g from many sources, various forms, at different rates, etc.). These factors impede practices not only processing data, but also analyzing displaying in an efficient manner user. Many efforts have completed mining visual analytics community create effective ways...

10.1109/hpec.2015.7322473 preprint EN 2015-09-01

Associative array model of SQL, NoSQL, and NewSQL databases

OPENALEX - Publications

Jeremy Kepner Vijay Gadepally David Hutchison Hayden Jananthan Timothy G. Mattson and 2 more

The success of SQL, NoSQL, and NewSQL databases is a reflection their ability to provide significant functionality performance benefits for specific domains, such as financial transactions, internet search, data analysis. BigDAWG polystore seeks mechanism allow applications transparently achieve the diverse while insulating from details these databases. Associative arrays common approach mathematics found in different databases: sets (SQL), graphs (NoSQL), matrices (NewSQL). This work...

10.1109/hpec.2016.7761647 preprint EN 2016-09-01

Clover: Toward Sustainable AI with Carbon-Aware Machine Learning Inference Service

OPENALEX - Publications

Baolin Li Siddharth Samsi Vijay Gadepally Devesh Tiwari

This paper presents a solution to the challenge of mitigating carbon emissions from hosting large-scale machine learning (ML) inference services. ML is critical modern technology products, but it also significant contributor footprint. We introduce Clover, carbon-friendly service runtime system that balances performance, accuracy, and through mixed-quality models GPU resource partitioning. Our experimental results demonstrate Clover effective in substantially reducing while maintaining high...

10.1145/3581784.3607034 preprint EN 2023-11-11

Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference

OPENALEX - Publications

Baolin Li Yankai Jiang Vijay Gadepally Devesh Tiwari

The rapid advancement of Generative Artificial Intelligence (GenAI) across diverse sectors raises significant environmental concerns, notably the carbon emissions from their cloud and high performance computing (HPC) infrastructure. This paper presents Sprout, an innovative framework designed to address these concerns by reducing footprint generative Large Language Model (LLM) inference services. Sprout leverages concept "generation directives" guide autoregressive generation process,...

10.48550/arxiv.2403.12900 preprint EN arXiv (Cornell University) 2024-03-19

DarwinGame: Playing Tournaments for Tuning Applications in Noisy Cloud Environments

OPENALEX - Publications

Rohan Basu Roy Vijay Gadepally Devesh Tiwari

10.1145/3669940.3707259 article EN 2025-02-03

Streaming graph challenge: Stochastic block partition

OPENALEX - Publications

Edward K. Kao Vijay Gadepally M. Hurley Michael Jones Jeremy Kepner and 7 more

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example the graph partition problem. As a combinatorial problem, NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled large Competitive benchmarks challenges have proven an effective means advance state-of-the-art foster community collaboration. This paper describes challenge with baseline algorithm of...

10.1109/hpec.2017.8091040 preprint EN 2017-09-01

Sparse Deep Neural Network Graph Challenge

OPENALEX - Publications

Jeremy Kepner Simon Alford Vijay Gadepally Michael Jones Lauren Milechin and 2 more

The MIT/IEEE/Amazon GraphChallenge.org encourages community approaches to developing new solutions for analyzing graphs and sparse data. Sparse AI analytics present unique scalability difficulties. proposed Deep Neural Network (DNN) Challenge draws upon prior challenges from machine learning, high performance computing, visual create a challenge that is reflective of emerging systems. DNN based on mathematically well-defined inference computation can be implemented in any programming...

10.1109/hpec.2019.8916336 preprint EN 2019-09-01

Coming Soon ...