Ashutosh Pattnaik

ORCID: 0000-0003-0367-5989
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Cloud Computing and Resource Management
  • Distributed and Parallel Computing Systems
  • Advanced Neural Network Applications
  • Interconnection Networks and Systems
  • Low-power high-performance VLSI design
  • Stochastic Gradient Optimization Techniques
  • Advanced Memory and Neural Computing
  • Advanced Image Fusion Techniques
  • Video Coding and Compression Technologies
  • Image and Video Quality Assessment
  • Hydrogen embrittlement and corrosion behaviors in metals
  • Robotics and Automated Systems
  • Fuel Cells and Related Materials
  • Microstructure and Mechanical Properties of Steels
  • CCD and CMOS Imaging Sensors
  • Extremum Seeking Control Systems
  • Brain Tumor Detection and Classification
  • Neural Networks and Reservoir Computing
  • Image and Signal Denoising Methods
  • Manufacturing Process and Optimization
  • Peripheral Neuropathies and Disorders
  • Video Surveillance and Tracking Methods
  • Blind Source Separation Techniques

American Rock Mechanics Association
2024

Pennsylvania State University
2015-2021

All India Institute of Medical Sciences Bhubaneswar
2020

Jain University
2020

National Institute of Technology Rourkela
2012-2020

Siksha O Anusandhan University
2020

Advanced Micro Devices (Canada)
2016

Processing data in or near memory (PIM), as opposed to conventional computational units a processor, can greatly alleviate the performance and energy penalties of transfers from/to main memory. Graphics Unit (GPU) architectures applications, where bandwidth is critical bottleneck, benefit from use PIM. To this end, an application should be properly partitioned scheduled execute on either main, powerful GPU cores that are far away auxiliary, simple close (e.g., logic layer 3D-stacked DRAM).

10.1145/2967938.2967940 article EN 2016-08-31

As GPUs make headway in the computing landscape spanning mobile platforms, supercomputers, cloud and virtual desktop supporting concurrent execution of multiple applications becomes essential for unlocking their full potential. However, unlike CPUs, multi-application is little explored. In this paper, we study memory system a concurrently executing environment. We first present an analytical performance model many-threaded architectures show that common use misses-per-kilo-instruction (MPKI)...

10.1145/2818950.2818979 article EN Proceedings of the International Symposium on Memory Systems 2015-10-05

Modern memory access schedulers employed in GPUs typically optimize for throughput. They implicitly assume that all requests from different cores are equally important. However, we show during the execution of a subset CUDA applications, can have amounts tolerance to latency. In particular, with larger fraction warps waiting data come back DRAM less likely tolerate latency an outstanding request. Requests such more critical than others. Based on this observation, paper introduces new...

10.1145/2896377.2901468 article EN 2016-06-10

Dynamic parallelism (DP) is a promising feature for GPUs, which allows on-demand spawning of kernels on the GPU without any CPU intervention. However, this has two major drawbacks. First, launching can incur significant performance penalties. Second, dynamically-generated are not always able to efficiently utilize cores due hardware-limits. To address these concerns cohesively, we propose SPAWN, runtime framework that controls kernels, thereby directly reducing associated launch overheads...

10.1109/hpca.2017.14 article EN 2017-02-01

Brain-inspired cognitive computing has so far followed two major approaches - one uses multi-layered artificial neural networks (ANNs) to perform pattern-recognition-related tasks, whereas the other spiking (SNNs) emulate biological neurons in an attempt be as efficient and fault-tolerant brain. While there been considerable progress former area due a combination of effective training algorithms acceleration platforms, latter is still its infancy lack both. SNNs have distinct advantage over...

10.1109/isca45697.2020.00039 article EN 2020-05-01

To improve the performance of Graphics Processing Units (GPUs) beyond simply increasing core count, architects are recently adopting a scale-up approach: peak throughput and individual capabilities GPU cores rapidly. This big-core trend in GPUs leads to various challenges, including higher static power consumption lower imbalanced utilization datapath components big core. As we show this paper, two key problems ensue: (1) can waste as an application does not always utilize all portions...

10.1145/2967938.2967941 article EN 2016-08-31

Data transfer overhead between computing cores and memory hierarchy has been a persistent issue for von Neumann architectures the problem only become more challenging with emergence of manycore systems. A conceptually powerful approach to mitigate this is bring computation closer data, known as Near Computing (NDC). Recently, NDC investigated in different flavors CPU-based multicores, while GPU domain received little attention. In paper, we present novel solution objective minimizing on-chip...

10.1145/3307650.3322212 article EN 2019-06-14

In this paper, a new and efficient cascade decision based filtering algorithm for the removal of high density Salt Pepper Noise in images is proposed. The proposed cascaded filter employing Modified Decision Based Median Filter as its first stage operation. second involves combination that calculates mean difference neighborhood pixels Unsymmetric Trimmed Mean Filter. when compared with existing non-linear filters such Standard (SMF), Adaptive (AMF), Algorithm (DBA), Progressive Switch...

10.1016/j.protcy.2012.10.014 article EN Procedia Technology 2012-01-01

On-chip data movement is a major source of power consumption in modern processors, and future technology nodes will exacerbate this problem. Properly understanding the that applications expend moving vital for inventing mitigation strategies. Previous studies combined energy, which required to move information across chip, with access used read or write onchip memories. This combination can hide severity problem, as memories interconnects scale differently nodes. Thus, increasing fidelity...

10.1109/iiswc.2016.7581263 article EN 2016-09-01

Video broadcast and streaming are among the most widely used applications for edge devices. Roughly 82% of mobile internet traffic is made up video data. This likely to worsen with advent 5G that will open new opportunities high resolution videos, virtual augmented reality-based applications. The raw data produced consumed by devices considerably higher than what transmitted out them. leads huge memory bandwidth energy requirements from such Therefore, optimizing consumption needs imperative...

10.1145/3352460.3358298 article EN 2019-10-11

Modern memory access schedulers employed in GPUs typically optimize for throughput. They implicitly assume that all requests from different cores are equally important. However, we show during the execution of a subset CUDA applications, can have amounts tolerance to latency. In particular, with larger fraction warps waiting data come back DRAM less likely tolerate latency an outstanding request. Requests such more critical than others. Based on this observation, paper introduces new...

10.1145/2964791.2901468 article EN ACM SIGMETRICS Performance Evaluation Review 2016-06-14

Dynamic parallelism (DP) is a new feature of emerging GPUs that allows kernels to be generated and scheduled from the device-side (GPU) without host-side (CPU) intervention. To efficiently support DP, one major challenges saturate GPU processing elements provide them with required data in timely fashion. In this paper, we first conduct limit study on performance improvements can achieved by hardware schedulers are provided accurate reuse information. We next propose LASER, Locality-Aware...

10.1145/3309697.3331473 article EN 2019-06-20

In this manuscript, a grid-connected Solid Oxide Fuel Cell (SOFC) system has been considered. SOFC numerous benefits in comparison to other available Cells (FCs) as it possesses longer stability, flexibility fuel use, negligible harmful emissions, excellent dynamic characteristics, and comparatively very less cost. Conventional PID controller being nonlinear fails potentially respond the nonlinearities of power network. Intending dynamically tune parameter robust Crow Search (CS) based...

10.1109/icces48766.2020.9138069 article EN 2022 7th International Conference on Communication and Electronics Systems (ICCES) 2020-06-01

The advent of machine learning (ML) and deep applications has led to the development a multitude hardware accelerators architectural optimization techniques for parallel architectures. This is due in part regularity parallelism exhibited by ML workloads, especially convolutional neural networks (CNNs). However, CPUs continue be one dominant compute fabric data-centers today, thereby also being widely deployed inference tasks. As CNNs grow larger, inherent limitations CPU-based system become...

10.1145/3357526.3357536 article EN Proceedings of the International Symposium on Memory Systems 2019-09-30

Dynamic parallelism (DP) is a new feature of emerging GPUs that allows kernels to be generated and scheduled from the deviceside (GPU) without host-side (CPU) intervention. To eiciently support DP, one major challenges saturate GPU processing elements provide them with required data in timely fashion. In this paper, we irst conduct limit study on performance improvements can achieved by hardware schedulers are provided accurate reuse information. We next propose LASER, Locality-Aware...

10.1145/3376930.3376947 article EN ACM SIGMETRICS Performance Evaluation Review 2019-12-17

GPUs are becoming prevalent in various domains of computing and widely used for streaming (regular) applications. However, they highly inefficient when executing irregular applications with unstructured inputs due to load imbalance. Dynamic parallelism (DP) is a new feature emerging that allows kernels be generated scheduled from the device-side (GPU) without host-side (CPU) intervention increase parallelism. To efficiently support DP, one major challenges saturate GPU processing elements...

10.1145/3287318 article EN Proceedings of the ACM on Measurement and Analysis of Computing Systems 2018-12-21

Machine/deep-learning (ML/DL) based techniques are emerging as a driving force behind many cutting-edge technologies, achieving high accuracy on computer vision workloads such image classification and object detection. However, training these models involving large parameters is both time-consuming energy-hogging. In this regard, several prior works have advocated for sparsity to speed up the of DL more so, inference phase. This work begins with observation that during training, in forward...

10.48550/arxiv.2109.07710 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Training Deep Neural Networks (DNNs) models is a time-consuming process that requires immense amount of data and computation. To this end, GPUs are widely adopted to accelerate the training process. However, delivered performance rarely scales with increase in number GPUs. The major reason behind large movement prevents system from providing required timely fashion. In paper, we propose ScaleDNN, framework systematically comprehensively investigates optimizes data-parallel on two types...

10.1109/iccad51958.2021.9643503 article EN 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) 2021-11-01

The present paper investigates the effect of raster angle and layer thickness on tensile strength Acrylonitrile butadiene styrene (ABS) developed via fused deposition modelling. Three levels angles (0°, 30°, 60°) (0.127, 0.178, 0.20 mm) were chosen while keeping width constant as 0.40mm. Tensile tests reveal that load bearing capacity specimens increase when there is a finite between filaments loading direction. Scanning Electron Micrographs (SEM) strong bonding layers necessary for good...

10.1063/1.5141582 article EN AIP conference proceedings 2020-01-01
Coming Soon ...