NFDI4DS | UHH-SEMS - Publication Details

Kushal Datta

ORCID: 0000-0003-1608-6040

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5070209939

Research Areas

Parallel Computing and Optimization Techniques
Embedded Systems Design Techniques
Low-power high-performance VLSI design
Advanced Data Storage Technologies
Interconnection Networks and Systems
Topic Modeling
Cloud Computing and Resource Management
AI in cancer detection
Caching and Content Delivery
Cell Image Analysis Techniques
Image Processing Techniques and Applications
Advanced Neural Network Applications
Magnetic properties of thin films
Advancements in Semiconductor Devices and Circuit Design
Quantum-Dot Cellular Automata
Distributed and Parallel Computing Systems
Advanced MEMS and NEMS Technologies
Traffic Prediction and Management Techniques
Machine Learning and Data Classification
Big Data and Business Intelligence
Nanowire Synthesis and Applications
Natural Language Processing Techniques
VLSI and FPGA Design Techniques
Data Quality and Management
Metaheuristic Optimization Algorithms Research

Intel (United States)
2011-2019

Intel (United Kingdom)
2016-2018

University of North Carolina at Charlotte
2006-2012

North Carolina State University
2007

The TileDB array data storage manager

OPENALEX - Publications

Stavros Papadopoulos Kushal Datta Samuel Madden Timothy G. Mattson

We present a novel storage manager for multi-dimensional arrays that arise in scientific applications, which is part of larger data management system called TileDB. In contrast to existing solutions, TileDB optimized both dense and sparse arrays. Its key idea organize array elements into ordered collections fragments. Each fragment or sparse, groups contiguous tiles fixed capacity. The organization fragments turns random writes sequential writes, and, coupled with read algorithm, leads very...

10.14778/3025111.3025117 article EN Proceedings of the VLDB Endowment 2016-11-01

Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model

OPENALEX - Publications

Aishwarya Bhandare Vamsi Sripathi Deepthi Karkada Vivek Menon Sun Mi Choi and 2 more

In this work, we quantize a trained Transformer machine language translation model leveraging INT8/VNNI instructions in the latest Intel$^\circledR$ Xeon$^\circledR$ Cascade Lake processors to improve inference performance while maintaining less than 0.5$\%$ drop accuracy. To best of our knowledge, is first attempt industry model. This has high impact as it clearly demonstrates various complexities quantizing We present novel quantization techniques directly TensorFlow opportunistically...

10.48550/arxiv.1906.00532 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Energy efficient scheduling of MapReduce workloads on heterogeneous clusters

OPENALEX - Publications

Nezih Yigitbasi Kushal Datta Nilesh Jain Theodore L. Willke

Energy efficiency has become the center of attention in emerging data infrastructures as increasing energy costs continue to outgrow all other operating expenditures. In this work we investigate aware scheduling heuristics increase MapReduce workloads on heterogeneous Hadoop clusters comprising both low power (wimpy) and high performance (brawny) nodes. We first make a case for heterogeneity by showing that Intel Atom processors Sandy Bridge are more efficient I/O bound CPU workloads,...

10.1145/2088996.2088997 article EN 2011-12-12

RBoot: Software Infrastructure for a Remote FPGA Laboratory

OPENALEX - Publications

Kushal Datta Ron Sass

Traditional FPGA education either involves a physical laboratory room with workstations connected to individual experimenter boards or simulation platforms. Physical labs are expensive maintain and require substantial floor space. In addition, students need be physically present in the laboratories access boards. On other hand, it is often case that platforms do not provide an adequate, in-depth understanding of concepts (such as synthesis on FPGAs). this short paper, third option - remote...

10.1109/fccm.2007.53 article EN 2007-04-01

Performance analysis of coarse-grained parallel genetic algorithms on the multi-core sun UltraSPARC T1

OPENALEX - Publications

Jong-Ho Byun Kushal Datta Arun Ravindran Arindam Mukherjee Bharat Joshi

The new generation of shared memory multi-core processors with multiple parallel execution paths provides a promising hardware platform for applications high degree task-level parallelism (TLP). Genetic Algorithm (GA), widely-used evolutionary meta-heuristic optimization method, is unique candidate in this class and demonstrates significant amount explicit implicit parallelism. In paper, we present the performance characteristics GA optimizing placement problem on Sun UltraSPARC T1...

10.1109/secon.2009.5174094 article EN 2009-03-01

A machine learning approach to modeling power and performance of chip multiprocessors

OPENALEX - Publications

Changshu Zhang Arun Ravindran Kushal Datta Arindam Mukherjee Bharat Joshi

Exploring the vast microarchitectural design space of chip multiprocessors (CMPs) through traditional approach exhaustive simulations is impractical due to long simulation times and its super-linear increase with core scaling. Kernel based statistical machine learning algorithms can potentially help predict multiple performance metrics non-linear dependence on CMP parameters. In this paper, we describe evaluate a framework that uses Canonical Correlation Analysis (KCCA) power dissipation...

10.1109/iccd.2011.6081374 article EN 2022 IEEE 40th International Conference on Computer Design (ICCD) 2011-10-01

CASPER: Embedding Power Estimation and Hardware-Controlled Power Management in a Cycle-Accurate Micro-Architecture Simulation Platform for Many-Core Multi-Threading Heterogeneous Processors

OPENALEX - Publications

Kushal Datta Arindam Mukherjee Cao Guang-yi Rohith Tenneti Vinay Vijendra Kumar Lakshmi and 2 more

Despite the promising performance improvement observed in emerging many-core architectures high processors, power consumption prohibitively affects their use and marketability low-energy sectors, such as embedded network processors application specific instruction (ASIPs). While most chip architects design power-efficient by finding an optimal power-performance balance design, some sophisticated on-chip autonomous management units, which dynamically reduce voltage or frequencies of idle...

10.3390/jlpea2010030 article EN cc-by Journal of Low Power Electronics and Applications 2012-02-01

Challenges and Opportunities in Transportation Data

OPENALEX - Publications

Kristin Tufte Kushal Datta Alekh Jindal David Maier Robert L. Bertini

From the time and money lost sitting in congestion waiting for traffic signals to change, many people injured killed crashes each year, emissions energy consumption from our vehicles, effects of transportation on daily lives are immense. A wealth data is available help address these problems; sensors installed monitor operate roadways cell phone apps -- just over horizon connected vehicles infrastructure. However, this has yet be effectively leveraged, thus providing opportunities areas such...

10.1145/3236461.3241971 article EN 2018-06-20

Training Multiscale-CNN for Large Microscopy Image Classification in One Hour

OPENALEX - Publications

Kushal Datta Imtiaz Hossain Sun Choi Vikram A. Saletore Kyle H. Ambert and 2 more

Existing approaches to train neural networks that use large images require either crop or down-sample data during pre-processing, small batch sizes, split the model across devices mainly due prohibitively limited memory capacity available on GPUs and emerging accelerators. These techniques often lead longer time convergence (TTT), in some cases, lower accuracy. CPUs, other hand, can leverage significant amounts of memory. While much work has been done parallelizing network training multiple...

10.48550/arxiv.1910.04852 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Automated design flow for diode-based nanofabrics

OPENALEX - Publications

Kushal Datta Arindam Mukherjee Arun Ravindran

We present an automated design flow for minimizing the use of diodes and switches (active devices) in implementations on a nanofabric based chemically self-assembled electronic nanotechnology as proposed Goldstein Budiu [2001]. Connectivity logic are realized using switch diode behaviors molecular devices, unlike very large scale integrated (VLSI) circuits where complementary metal-oxide semiconductor (CMOS) gates used. Similar to optimization goal reducing number VLSI designs minimize area,...

10.1145/1167943.1167946 article EN ACM Journal on Emerging Technologies in Computing Systems 2006-07-01

Performance modeling of parallel Magnetostatic Wave calculations on shared memory multicore

OPENALEX - Publications

Reshmi Mitra Bharat Joshi Arun Ravindran Ryan S. Adams Arindam Mukherjee and 2 more

The focus of this work is to identify data partitioning strategies and their performance models for memory intensive two dimensional Magneto-Static Wave (MSW) calculations shared architecture. We have constructed computing, communication synchronization time the different schemes. identified that improved any scheme can be achieved by reduced boundary sharing, decreasing stride penalties, requirement increased sharing. A maximum speed-up 3.9 largest size observed one - partitioning.

10.1109/secon.2010.5453902 article EN 2010-03-01

Autonomous Power Management in Embedded Multi-Cores

OPENALEX - Publications

Arindam Mukherjee Arun Ravindran Bharat Kumar Joshi Kushal Datta Yue Liu

Arindam Mukherjee, Arun Ravindran, Bharat Kumar Joshi, Kushal Datta and Yue LiuElectrical Computer Engineering Department University of North Carolina Charlotte, NC, USA {amukherj, aravindr, bsjoshi, kdatta, yliu42}@uncc.edu10.1 Introduction . 33810.1.1 Why Is Autonomous Power Management Necessary? 33910.1.1.1 Sporadic Processing Requirements 33910.1.1.2 Run-time Monitoring System Parameters 34010.1.1.3 Temperature 34010.1.1.4 Power/Ground Noise 34110.1.1.5 Real-Time Constraints 34110.2...

10.1201/9781315218199-17 article EN 2018-10-08

Coming Soon ...