NFDI4DS | UHH-SEMS - Publication Details

Gopinath Chennupati

ORCID: 0000-0002-6223-8570

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5063725454

Research Areas

Parallel Computing and Optimization Techniques
Advanced Data Storage Technologies
Evolutionary Algorithms and Applications
Adversarial Robustness in Machine Learning
Machine Learning and Data Classification
Metaheuristic Optimization Algorithms Research
Cloud Computing and Resource Management
Protein Structure and Dynamics
Speech Recognition and Synthesis
Distributed and Parallel Computing Systems
Topic Modeling
Anomaly Detection Techniques and Applications
Machine Learning in Bioinformatics
Computational Drug Discovery Methods
Tensor decomposition and applications
Algorithms and Data Compression
Interconnection Networks and Systems
Embedded Systems Design Techniques
Domain Adaptation and Few-Shot Learning
Music and Audio Processing
Low-power high-performance VLSI design
Computability, Logic, AI Algorithms
Artificial Intelligence in Healthcare and Education
Explainable Artificial Intelligence (XAI)
Natural Language Processing Techniques

Amazon (United States)
2021-2024

Los Alamos National Laboratory
2017-2022

Amazon (Germany)
2021

University of Limerick
2014-2016

Quantum Algorithm Implementations for Beginners

OPENALEX - Publications

J. Abhijith Adetokunbo Adedoyin John Ambrosiano Petr M. Anisimov William Casper and 29 more

As quantum computers become available to the general public, need has arisen train a cohort of programmers, many whom have been developing classical computer programs for most their careers. While currently less than 100 qubits, computing hardware is widely expected grow in terms qubit count, quality, and connectivity. This review aims explain principles programming, which are quite different from with straightforward algebra that makes understanding underlying fascinating mechanical...

10.1145/3517340 article EN ACM Transactions on Quantum Computing 2022-03-28

Combating Label Noise in Deep Learning Using Abstention

OPENALEX - Publications

Sunil Thulasidasan Tanmoy Bhattacharya Jeff Bilmes Gopinath Chennupati Jamal Mohd-Yusof

We introduce a novel method to combat label noise when training deep neural networks for classification. propose loss function that permits abstention during thereby allowing the DNN abstain on confusing samples while continuing learn and improve classification performance non-abstained samples. show how such abstaining classifier (DAC) can be used robust learning in presence of different types noise. In case structured or systematic -- where noisy labels examples are correlated with...

10.48550/arxiv.1905.10964 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Finding the Number of Latent Topics With Semantic Non-Negative Matrix Factorization

OPENALEX - Publications

Raviteja Vangara Manish Bhattarai Erik Skau Gopinath Chennupati Hristo Djidjev and 4 more

Topic modeling, or identifying the set of topics that occur in a collection articles, is one primary objectives text mining. Typically, corpus represented as words-by-documents matrix, X, where xij , encodes i-th word importance score j-th document using Term Frequency-Inverse Document Frequency (TF-IDF) representation. Non-negative Matrix Factorization (NMF) can then be used order to extract and model corpus. NMF approximates X product two low-rank non-negative factors:W, which represents...

10.1109/access.2021.3106879 article EN cc-by IEEE Access 2021-01-01

PPT-GPU: Scalable GPU Performance Modeling

OPENALEX - Publications

Yehia Arafa Abdel‐Hameed A. Badawy Gopinath Chennupati Nandakishore Santhi Stephan Eidenbenz

Performance modeling is a challenging problem due to the complexities of hardware architectures. In this paper, we present PPT-GPU, scalable and accurate simulation framework that enables GPU code developers architects predict performance applications in fast, manner on different PPT-GPU part open source project, Prediction Toolkit (PPT) developed at Los Alamos National Laboratory. We extend old model PPT runtimes computational physics codes offer better prediction accuracy, for which, add...

10.1109/lca.2019.2904497 article EN publisher-specific-oa IEEE Computer Architecture Letters 2019-01-01

Distributed non-negative matrix factorization with determination of the number of latent features

OPENALEX - Publications

Gopinath Chennupati Raviteja Vangara Erik Skau Hristo Djidjev Boian S. Alexandrov

10.1007/s11227-020-03181-6 article EN The Journal of Supercomputing 2020-02-08

Verified instruction-level energy consumption measurement for NVIDIA GPUs

OPENALEX - Publications

Yehia Arafa Ammar ElWazir Abdelrahman Elkanishy Youssef Aly Ayatelrahman Elsayed and 4 more

GPUs are prevalent in modern computing systems at all scales. They consume a significant fraction of the energy these systems. However, vendors do not publish actual cost power/energy overhead their internal microarchitecture. In this paper, we accurately measure consumption various PTX instructions found NVIDIA GPUs. We provide an exhaustive comparison more than 40 for four high-end from different generations (Maxwell, Pascal, Volta, and Turing). Furthermore, show effect CUDA compiler...

10.1145/3387902.3392613 preprint EN 2020-05-11

Code Characterization With Graph Convolutions and Capsule Networks

OPENALEX - Publications

P. Haridas Gopinath Chennupati Nandakishore Santhi Phillip Romero Stephan Eidenbenz

We propose SiCaGCN, a learning system to predict the similarity of given software code set codes that are permitted run on computational resource, such as supercomputer or cloud server. This characterization allows us detect abusive codes. Our relies structural analysis control-flow graph and two different measures: Graph Edit Distance (GED) singular values based metric. SiCaGCN combines elements Convolutional Neural Networks (GCN), Capsule networks, attention mechanism, neural tensor...

10.1109/access.2020.3011909 article EN cc-by IEEE Access 2020-01-01

Low Overhead Instruction Latency Characterization for NVIDIA GPGPUs

OPENALEX - Publications

Yehia Arafa Abdel‐Hameed A. Badawy Gopinath Chennupati Nandakishore Santhi Stephan Eidenbenz

The last decade has seen a shift in the computer systems industry where heterogeneous computing become prevalent. Graphics Processing Units (GPUs) are now present supercomputers to mobile phones and tablets. GPUs used for graphics operations as well general-purpose (GPGPUs) boost performance of compute-intensive applications. However, percentage undisclosed characteristics beyond what vendors provide is not small. In this paper, we introduce very low overhead portable analysis exposing...

10.1109/hpec.2019.8916466 article EN 2019-09-01

Fast, accurate, and scalable memory modeling of GPGPUs using reuse profiles

OPENALEX - Publications

Yehia Arafa Abdel‐Hameed A. Badawy Gopinath Chennupati Atanu Barai Nandakishore Santhi and 1 more

In this paper, we introduce an accurate and scalable memory modeling framework for General Purpose Graphics Processor units (GPGPUs), PPT-GPU-Mem. That is Performance Prediction Tool-Kit GPUs Cache Memories. PPT-GPU-Mem predicts the performance of different GPUs' cache hierarchy (L1 & L2) based on reuse profiles. We extract a trace each GPU kernel once in its lifetime using recently released binary instrumentation tool, NVBIT. The extraction architecture-independent can be done any available...

10.1145/3392717.3392761 article EN 2020-06-29

Hybrid, scalable, trace-driven performance modeling of GPGPUs

OPENALEX - Publications

Yehia Arafa Abdel‐Hameed A. Badawy Ammar ElWazir Atanu Barai Ali Eker and 3 more

In this paper, we present PPT-GPU, a scalable performance prediction toolkit for GPUs. PPT-GPU achieves scalability through hybrid high-level modeling approach where some computations are extrapolated and multiple parts of the model parallelized. The tool primary models use pre-collected memory instructions traces workloads to accurately capture dynamic behavior kernels.

10.1145/3458817.3476221 article EN 2021-10-21

Parallel Application Performance Prediction Using Analysis Based Models and HPC Simulations

OPENALEX - Publications

Mohammad Abu Obaida Jason Liu Gopinath Chennupati Nandakishore Santhi Stephan Eidenbenz

Parallel application performance models provide valuable insight about the in real systems. Capable tools providing fast, accurate, and comprehensive prediction evaluation of high-performance computing (HPC) applications system architectures have important value. This paper presents PyPassT, an analysis based modeling framework built on static program integrated simulation target HPC architectures. More specifically, analyzes source code written C with OpenACC directives transforms it into...

10.1145/3200921.3200937 article EN 2018-05-14

Performance Optimization of Multi-Core Grammatical Evolution Generated Parallel Recursive Programs

OPENALEX - Publications

Gopinath Chennupati R. Muhammad Atif Azad Conor Ryan

Although Evolutionary Computation (EC) has been used with considerable success to evolve computer programs, the majority of this work targeted production serial code. Recent Grammatical Evolution (GE) produced Multi-core (MCGE-II), a system that natively produces parallel code, including ability execute recursive calls in parallel.

10.1145/2739480.2754746 article EN 2015-07-07

An Effective Baseline for Robustness to Distributional Shift

OPENALEX - Publications

Sunil Thulasidasan Sushil Thapa Sayera Dhaubhadel Gopinath Chennupati Tanmoy Bhattacharya and 1 more

Refraining from confidently predicting when faced with categories of inputs different those seen during training is an important requirement for the safe deployment deep learning systems. While simple to state, this has been a particularly challenging problem in learning, where models often end up making overconfident predictions such situations. In work, we present simple, but highly effective approach deal out-of-distribution detection that uses principle abstention: encountering sample...

10.1109/icmla52953.2021.00050 article EN 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA) 2021-12-01

General-purpose Unsupervised Cyber Anomaly Detection via Non-negative Tensor Factorization

OPENALEX - Publications

Maksim E. Eren Juston Moore Erik Skau Elisabeth Moore Manish Bhattarai and 2 more

Distinguishing malicious anomalous activities from unusual but benign is a fundamental challenge for cyber defenders. Prior studies have shown that statistical user behavior analysis yields accurate detections by learning profiles observed activity. These unsupervised models are able to generalize unseen types of attacks detecting deviations normal without knowledge specific attack signatures. However, approaches proposed date based on probabilistic matrix factorization limited the...

10.1145/3519602 article EN Digital Threats Research and Practice 2022-04-12

Scalable Performance Prediction of Codes with Memory Hierarchy and Pipelines

OPENALEX - Publications

Gopinath Chennupati Nandakishore Santhi Stephan Eidenbenz

We present the Analytical Memory Model with Pipelines (AMMP) of Performance Prediction Toolkit (PPT). PPT-AMMP takes high-level source code and hardware architecture parameters as input, predicts runtime that on target platform, which is defined in input parameters. transforms to an (architecture-independent) intermediate representation, then (i) analyzes basic block structure code, (ii) processes architecture-independent virtual memory access patterns it uses build reuse distance...

10.1145/3316480.3325518 article EN public-domain 2019-05-29

Semantic Nonnegative Matrix Factorization with Automatic Model Determination for Topic Modeling

OPENALEX - Publications

Raviteja Vangara Erik Skau Gopinath Chennupati Hristo Djidjev T. E. Tierney and 4 more

Non-negative Matrix Factorization (NMF) models the topics of a text corpus by decomposing matrix term frequency-inverse document frequency (TF-IDF) representation, X, into two low-rank non-negative matrices: W , representing and H, mapping documents onto space topics. One challenge, common to all topic models, is determination number latent (aka model determination). Determining correct important: underestimating results in poor separation, under-fitting, while overestimating leads noisy...

10.1109/icmla51294.2020.00060 article EN 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA) 2020-12-01

An analytical memory hierarchy model for performance prediction

OPENALEX - Publications

Gopinath Chennupati Nandakishore Santhi Stephan Eidenbenz Sunil Thulasidasan

As the US Department of Energy (DOE) invests in exascale computing, performance modeling physics codes on CPUs remain a challenge computational co-design due to complex design processors that include memory hierarchies, instruction pipelining, and speculative execution. We present Analytical Memory Model (AMM), model cache hierarchy, embedded Performance Prediction Toolkit (PPT) — suite discrete-event-simulation-based codesign hardware software models. AMM enables PPT significantly improve...

10.5555/3242181.3242251 article EN Winter Simulation Conference 2017-12-03

Federated Self-Learning with Weak Supervision for Speech Recognition

OPENALEX - Publications

Milind Rao Gopinath Chennupati Gautam Tiwari Anit Kumar Sahu Anirudh Raju and 2 more

Automatic speech recognition (ASR) models with low-footprint are increasingly being deployed on edge devices for conversational agents, which enhances privacy. We study the problem of federated continual incremental learning recurrent neural network-transducer (RNN-T) ASR in privacy-enhancing scheme on-device, without access to ground truth human transcripts or machine transcriptions from a stronger model. In particular, we performance self-learning based scheme, paired teacher model updated...

10.1109/icassp49357.2023.10096983 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Significant ASR Error Detection for Conversational Voice Assistants

OPENALEX - Publications

John Harvill Rinat Khaziev Scarlett Li Randy Cogill Lidan Wang and 2 more

Modern Automatic Speech Recognition (ASR) systems are evaluated with respect to Word Error Rate (WER). While WER is a useful metric for training and evaluation of speech models, it does not fully reflect the difference in semantics between predicted ground truth transcriptions. In conversational voice assistants, ability sufficiently understand semantic meaning user request often more important than recognizing all words correctly. this work, we propose system that can determine, high degree...

10.1109/icassp48485.2024.10448230 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

Multi-core GE

OPENALEX - Publications

Gopinath Chennupati R. Muhammad Atif Azad Conor Ryan

We describe the utilization of on-chip multiple CPU architectures to automatically evolve parallel computer programs. These programs have capability exploiting computational efficiency modern multi-core machines.

10.1145/2598394.2605670 article EN 2014-07-11

Distributed Non-Negative Tensor Train Decomposition

OPENALEX - Publications

Manish Bhattarai Gopinath Chennupati Erik Skau Raviteja Vangara Hristo Djidjev and 1 more

The era of exascale computing opens new venues for innovations and discoveries in many scientific, engineering, commercial fields. However, with the exaflops also come extra-large high-dimensional data generated by highperformance computing. High-dimensional is presented as multidimensional arrays, aka tensors. presence latent (not directly observable) structures tensor allows a unique representation compression classical factorization techniques. methods are not always stable or they can be...

10.1109/hpec43674.2020.9286234 article EN 2020-09-22

Coming Soon ...