NFDI4DS | UHH-SEMS - Publication Details

Vijay Gadepally

ORCID: 0009-0004-5782-7137

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5102979602

Research Areas

Parallel Computing and Optimization Techniques
Embedded Systems Design Techniques
Advanced Data Storage Technologies
Cloud Computing and Resource Management
Teaching and Learning Programming
Advanced Neural Network Applications
Advanced Memory and Neural Computing
Real-time simulation and control systems
CCD and CMOS Imaging Sensors
Distributed and Parallel Computing Systems
Access Control and Trust
Marxism and Critical Theory
Machine Learning in Materials Science
Security and Verification in Computing
Computational Physics and Python Applications
Political Conflict and Governance
Meteorological Phenomena and Simulations
Computational Drug Discovery Methods
Politics and Society in Latin America
Graph Theory and Algorithms
Precipitation Measurement and Analysis
Particle Detector Development and Performance
Scientific Computing and Data Management
Distributed systems and fault tolerance
Cryptography and Data Security

MIT Lincoln Laboratory
2017-2024

Massachusetts Institute of Technology
2021-2024

Ohio Supercomputer Center
2012

The Ohio State University
2012

Indian Institute of Technology Kanpur
2010

AI Accelerator Survey and Trends

OPENALEX - Publications

Albert Reuther Peter Michaleas Michael Jones Vijay Gadepally Siddharth Samsi and 1 more

Over the past several years, new machine learning accelerators were being announced and released every month for a variety of applications from speech recognition, video object detection, assisted driving, many data center applications. This paper updates survey AI processors two years. collects summarizes cur-rent commercial that have been publicly with peak performance power consumption numbers. The values are plotted on scatter graph, number dimensions observations trends this plot again...

10.1109/hpec49654.2021.9622867 preprint EN 2021-09-20

MISO

OPENALEX - Publications

Baolin Li Tirthak Patel Siddharth Samsi Vijay Gadepally Devesh Tiwari

GPU technology has been improving at an expedited pace in terms of size and performance, empowering HPC AI/ML researchers to advance the scientific discovery process. However, this also leads inefficient resource usage, as most workloads, including complicated models, are not able utilize resources their fullest extent - encouraging support for multi-tenancy. We propose MISO, a technique exploit Multi-Instance (MIG) capability on latest NVIDIA datacenter GPUs (e.g., A100, H100) dynamically...

10.1145/3542929.3563510 preprint EN 2022-11-07

Learning by doing, High Performance Computing education in the MOOC era

OPENALEX - Publications

Julia Mullen Chansup Byun Vijay Gadepally Siddharth Samsi Albert Reuther and 1 more

10.1016/j.jpdc.2017.01.015 article EN Journal of Parallel and Distributed Computing 2017-01-17

Bliss: auto-tuning complex applications using a pool of diverse lightweight learning models

OPENALEX - Publications

Rohan Basu Roy Tirthak Patel Vijay Gadepally Devesh Tiwari

As parallel applications become more complex, auto-tuning becomes desirable, challenging, and time-consuming. We propose, Bliss, a novel solution for without requiring apriori information about applications, domain-specific knowledge, or instrumentation. Bliss demonstrates how to leverage pool of Bayesian Optimization models find the near-optimal parameter setting 1.64× faster than state-of-the-art approaches.

10.1145/3453483.3454109 article EN 2021-06-18

MATLAB for Signal Processing on Multiprocessors and Multicores

OPENALEX - Publications

Siddharth Samsi Vijay Gadepally Ashok Krishnamurthy

MATLABR is a popular choice for algorithm development in signal and image processing. While traditionally this done using sequential MATLAB running on desktop systems, recent years have seen surge of interest parallel to take advantage multi-processor multi-core systems. In paper, we discuss three variations MATLAB, two which are available as commercial, supported products. We also consider with key computations speeded up multi-threaded GPGPUs. Two processing kernels (FFT convolution) full...

10.1109/msp.2009.935421 article EN IEEE Signal Processing Magazine 2010-03-01

Benchmarking data analysis and machine learning applications on the Intel KNL many-core processor

OPENALEX - Publications

Chansup Byun Jeremy Kepner William Arcand David Bestor Bill Bergeron and 13 more

Knights Landing (KNL) is the code name for second-generation Intel Xeon Phi product family. KNL has generated significant interest in data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The vector processor design enables it to exploit much higher levels parallelism. At Lincoln Laboratory Supercomputing Center (LLSC), majority users are running applications such as MATLAB Octave. More recently, applications, UC Berkeley Caffe...

10.1109/hpec.2017.8091067 preprint EN 2017-09-01

Don't Even Ask

OPENALEX - Publications

Richard Shay Uri Blumenthal Vijay Gadepally Ariel Hamlin John Darby Mitchell and 1 more

This paper presents a vision and description for query control, which is paradigm database access control. In this model, individual queries are examined before being executed either allowed or denied by pre-defined policy. Traditional view-based control requires the enforcer to view query, records, both. That may present difficulty when not contents itself. discussion of arises from our experience with privacy-preserving encrypted databases, in no single entity learns both contents. Query...

10.1145/3316416.3316420 article EN ACM SIGMOD Record 2019-02-27

Sustainable Supercomputing for AI

OPENALEX - Publications

Dan Zhao Siddharth Samsi Joseph McDonald Baolin Li David Bestor and 3 more

As research and deployment of AI grows, the computational burden to support sustain its progress inevitably does too. To train or fine-tune state-of-the-art models in NLP, computer vision, etc., some form hardware acceleration is virtually a requirement. Recent large language require considerable resources deploy, resulting significant energy usage, potential carbon emissions, massive demand for GPUs other accelerators. However, this surge carries implications sustainability at...

10.1145/3620678.3624793 preprint EN cc-by 2023-10-30

Optimizing Xeon Phi for Interactive Data Analysis

OPENALEX - Publications

Chansup Byun Anne Klein Lauren Milechin Peter Michaleas Julie Mullen and 13 more

The Intel Xeon Phi manycore processor is designed to provide high performance matrix computations of the type often performed in data analysis. Common analysis environments include Matlab, GNU Octave, Julia, Python, and R. Achieving optimal operations within requires tuning OpenMP settings, process pinning, memory modes. This paper describes multiplication results for Matlab Octave over a variety combinations counts threads These indicate that using KMP_AFFINITY=granlarity=fine, taskset...

10.1109/hpec.2019.8916300 preprint EN 2019-09-01

Lincoln AI Computing Survey (LAICS) Update

OPENALEX - Publications

Albert Reuther Peter Michaleas Michael Jones Vijay Gadepally Siddharth Samsi and 1 more

This paper is an update of the survey AI accelerators and processors from past four years, which now called Lincoln Computing Survey - LAICS (pronounced "lace"). As in this collects summarizes current commercial that have been publicly announced with peak performance power consumption numbers. The values are plotted on a scatter graph, number dimensions observations trends plot again discussed analyzed. Market segments highlighted plot, zoomed plots each segment also included. Finally, brief...

10.1109/hpec58863.2023.10363568 article EN 2023-09-25

Database engine integration and performance analysis of the BigDAWG polystore system

OPENALEX - Publications

Yu Katherine Vijay Gadepally Michael Stonebraker

The BigDAWG polystore database system aims to address workloads dealing with large, heterogeneous datasets. need for such a is motivated by an increase in Big Data applications disparate types of data, from large scale analytics realtime data streams text-based records, each suited different storage engines. These often perform cross-engine queries on correlated resulting complex query planning, migration, and execution. One application medical built the Intel Science Technology Center...

10.1109/hpec.2017.8091081 article EN 2017-09-01

Large Scale Parallelization Using File-Based Communications

OPENALEX - Publications

Chansup Byun Anna Klein Peter Michaleas Julie Mullen Andrew Prout and 12 more

In this paper, we present a novel and new file-based communication architecture using the local filesystem for large scale parallelization. This approach eliminates issues with overload resource contention when central parallel jobs. The incurs additional overhead due to inter-node message file transfers both sending receiving processes are not on same node. However, even cost, its benefits far greater overall cluster operation in addition performance enhancement communications For example,...

10.1109/hpec.2019.8916221 preprint EN 2019-09-01

LLload: Simplifying Real-Time Job Monitoring for HPC Users

OPENALEX - Publications

Chansup Byun Julia Mullen Albert Reuther William Arcand William Bergeron and 14 more

One of the more complex tasks for researchers using HPC systems is performance monitoring and tuning their applications.Developing a practice continuous improvement, both speed-up efficient use resources essential to long term success practitioner research project.Profiling tools provide nice view an application but often have steep learning curve rarely easy interpret resource utilization.Lower level such as top htop utilization those familiar comfortable with Linux barrier newer...

10.1145/3626203.3670565 article EN 2024-07-17

A Hands-on Education Program on Cyber Physical Systems for High School Students

OPENALEX - Publications

Vijay Gadepally Ashok Krishnamurthy Ümi̇t Özgüner

Cyber Physical Systems (CPS) are the conjoining of an entities' physical and computational elements.The development a typical CPS system follows sequence from conceptual modeling, testing in simulated (virtual) worlds, controlled (possibly laboratory) environments finally deployment.Throughout each (repeatable) stage, behavior entities, sensing situation assessment, computation control options have to be understood carefully represented through abstraction.The Group at Ohio State University,...

10.22369/issn.2153-4136/3/2/2 article EN The Journal of Computational Science Education 2012-12-01

A Hands-on Education Program on Cyber Physical Systems for High School Students

OPENALEX - Publications

Vijay Gadepally Ashok Krishnamurthy Ümi̇t Özgüner

Cyber Physical Systems (CPS) are the conjoining of an entities' physical and computational elements. The development a typical CPS system follows sequence from conceptual modeling, testing in simulated (virtual) worlds, controlled (possibly laboratory) environments finally deployment. Throughout each (repeatable) stage, behavior entities, sensing situation assessment, computation control options have to be understood carefully represented through abstraction. Group at Ohio State University,...

10.48550/arxiv.1408.0521 preprint EN other-oa arXiv (Cornell University) 2014-01-01

Scalable Geometric Deep Learning on Molecular Graphs

OPENALEX - Publications

Nathan C. Frey Siddharth Samsi J.C. McDonald Lin Li Connor W. Coley and 1 more

Deep learning in molecular and materials sciences is limited by the lack of integration between applied science, artificial intelligence, high-performance computing. Bottlenecks with respect to amount training data, size complexity model architectures, scale compute infrastructure are all key factors limiting scaling deep for molecules materials. Here, we present $\textit{LitMatter}$, a lightweight framework methods. We train four graph neural network architectures on over 400 GPUs...

10.48550/arxiv.2112.03364 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Lincoln AI Computing Survey (LAICS) Update

OPENALEX - Publications

Albert Reuther Peter Michaleas Michael Jones Vijay Gadepally Siddharth Samsi and 1 more

10.48550/arxiv.2310.09145 preprint EN cc-by arXiv (Cornell University) 2023-01-01

MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant Systems for Machine Learning

OPENALEX - Publications

Baolin Li Tirthak Patel Siddarth Samsi Vijay Gadepally Devesh Tiwari

GPU technology has been improving at an expedited pace in terms of size and performance, empowering HPC AI/ML researchers to advance the scientific discovery process. However, this also leads inefficient resource usage, as most workloads, including complicated models, are not able utilize resources their fullest extent -- encouraging support for multi-tenancy. We propose MISO, a technique exploit Multi-Instance (MIG) capability on latest NVIDIA datacenter GPUs (e.g., A100, H100) dynamically...

10.48550/arxiv.2207.11428 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Performance Estimation for Efficient Image Segmentation Training of Weather Radar Algorithms

OPENALEX - Publications

Joseph McDonald James M. Kurdzo Phillip M. Stepanian Mark Veillette David Bestor and 3 more

Deep Learning has a dramatically increasing demand for compute resources and corresponding increase in the energy required to develop, explore, test model architectures various applications. Parameter tuning networks customarily involves training multiple models search over grid of parameter choices either randomly or exhaustively, strategies applying complex methods identify candidate require significant computation each possible architecture sampled spaces. However, these approaches...

10.1109/hpec55821.2022.9926400 article EN 2022-09-19

Coming Soon ...