Kushal Datta

ORCID: 0000-0003-1608-6040
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Embedded Systems Design Techniques
  • Low-power high-performance VLSI design
  • Advanced Data Storage Technologies
  • Interconnection Networks and Systems
  • Topic Modeling
  • Cloud Computing and Resource Management
  • AI in cancer detection
  • Caching and Content Delivery
  • Cell Image Analysis Techniques
  • Image Processing Techniques and Applications
  • Advanced Neural Network Applications
  • Magnetic properties of thin films
  • Advancements in Semiconductor Devices and Circuit Design
  • Quantum-Dot Cellular Automata
  • Distributed and Parallel Computing Systems
  • Advanced MEMS and NEMS Technologies
  • Traffic Prediction and Management Techniques
  • Machine Learning and Data Classification
  • Big Data and Business Intelligence
  • Nanowire Synthesis and Applications
  • Natural Language Processing Techniques
  • VLSI and FPGA Design Techniques
  • Data Quality and Management
  • Metaheuristic Optimization Algorithms Research

Intel (United States)
2011-2019

Intel (United Kingdom)
2016-2018

University of North Carolina at Charlotte
2006-2012

North Carolina State University
2007

We present a novel storage manager for multi-dimensional arrays that arise in scientific applications, which is part of larger data management system called TileDB. In contrast to existing solutions, TileDB optimized both dense and sparse arrays. Its key idea organize array elements into ordered collections fragments. Each fragment or sparse, groups contiguous tiles fixed capacity. The organization fragments turns random writes sequential writes, and, coupled with read algorithm, leads very...

10.14778/3025111.3025117 article EN Proceedings of the VLDB Endowment 2016-11-01

In this work, we quantize a trained Transformer machine language translation model leveraging INT8/VNNI instructions in the latest Intel$^\circledR$ Xeon$^\circledR$ Cascade Lake processors to improve inference performance while maintaining less than 0.5$\%$ drop accuracy. To best of our knowledge, is first attempt industry model. This has high impact as it clearly demonstrates various complexities quantizing We present novel quantization techniques directly TensorFlow opportunistically...

10.48550/arxiv.1906.00532 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Energy efficiency has become the center of attention in emerging data infrastructures as increasing energy costs continue to outgrow all other operating expenditures. In this work we investigate aware scheduling heuristics increase MapReduce workloads on heterogeneous Hadoop clusters comprising both low power (wimpy) and high performance (brawny) nodes. We first make a case for heterogeneity by showing that Intel Atom processors Sandy Bridge are more efficient I/O bound CPU workloads,...

10.1145/2088996.2088997 article EN 2011-12-12

Traditional FPGA education either involves a physical laboratory room with workstations connected to individual experimenter boards or simulation platforms. Physical labs are expensive maintain and require substantial floor space. In addition, students need be physically present in the laboratories access boards. On other hand, it is often case that platforms do not provide an adequate, in-depth understanding of concepts (such as synthesis on FPGAs). this short paper, third option - remote...

10.1109/fccm.2007.53 article EN 2007-04-01

The new generation of shared memory multi-core processors with multiple parallel execution paths provides a promising hardware platform for applications high degree task-level parallelism (TLP). Genetic Algorithm (GA), widely-used evolutionary meta-heuristic optimization method, is unique candidate in this class and demonstrates significant amount explicit implicit parallelism. In paper, we present the performance characteristics GA optimizing placement problem on Sun UltraSPARC T1...

10.1109/secon.2009.5174094 article EN 2009-03-01

Exploring the vast microarchitectural design space of chip multiprocessors (CMPs) through traditional approach exhaustive simulations is impractical due to long simulation times and its super-linear increase with core scaling. Kernel based statistical machine learning algorithms can potentially help predict multiple performance metrics non-linear dependence on CMP parameters. In this paper, we describe evaluate a framework that uses Canonical Correlation Analysis (KCCA) power dissipation...

10.1109/iccd.2011.6081374 article EN 2022 IEEE 40th International Conference on Computer Design (ICCD) 2011-10-01

Despite the promising performance improvement observed in emerging many-core architectures high processors, power consumption prohibitively affects their use and marketability low-energy sectors, such as embedded network processors application specific instruction (ASIPs). While most chip architects design power-efficient by finding an optimal power-performance balance design, some sophisticated on-chip autonomous management units, which dynamically reduce voltage or frequencies of idle...

10.3390/jlpea2010030 article EN cc-by Journal of Low Power Electronics and Applications 2012-02-01

From the time and money lost sitting in congestion waiting for traffic signals to change, many people injured killed crashes each year, emissions energy consumption from our vehicles, effects of transportation on daily lives are immense. A wealth data is available help address these problems; sensors installed monitor operate roadways cell phone apps -- just over horizon connected vehicles infrastructure. However, this has yet be effectively leveraged, thus providing opportunities areas such...

10.1145/3236461.3241971 article EN 2018-06-20

Existing approaches to train neural networks that use large images require either crop or down-sample data during pre-processing, small batch sizes, split the model across devices mainly due prohibitively limited memory capacity available on GPUs and emerging accelerators. These techniques often lead longer time convergence (TTT), in some cases, lower accuracy. CPUs, other hand, can leverage significant amounts of memory. While much work has been done parallelizing network training multiple...

10.48550/arxiv.1910.04852 preprint EN other-oa arXiv (Cornell University) 2019-01-01

We present an automated design flow for minimizing the use of diodes and switches (active devices) in implementations on a nanofabric based chemically self-assembled electronic nanotechnology as proposed Goldstein Budiu [2001]. Connectivity logic are realized using switch diode behaviors molecular devices, unlike very large scale integrated (VLSI) circuits where complementary metal-oxide semiconductor (CMOS) gates used. Similar to optimization goal reducing number VLSI designs minimize area,...

10.1145/1167943.1167946 article EN ACM Journal on Emerging Technologies in Computing Systems 2006-07-01

The focus of this work is to identify data partitioning strategies and their performance models for memory intensive two dimensional Magneto-Static Wave (MSW) calculations shared architecture. We have constructed computing, communication synchronization time the different schemes. identified that improved any scheme can be achieved by reduced boundary sharing, decreasing stride penalties, requirement increased sharing. A maximum speed-up 3.9 largest size observed one - partitioning.

10.1109/secon.2010.5453902 article EN 2010-03-01

Arindam Mukherjee, Arun Ravindran, Bharat Kumar Joshi, Kushal Datta and Yue LiuElectrical Computer Engineering Department University of North Carolina Charlotte, NC, USA {amukherj, aravindr, bsjoshi, kdatta, yliu42}@uncc.edu10.1 Introduction . 33810.1.1 Why Is Autonomous Power Management Necessary? 33910.1.1.1 Sporadic Processing Requirements 33910.1.1.2 Run-time Monitoring System Parameters 34010.1.1.3 Temperature 34010.1.1.4 Power/Ground Noise 34110.1.1.5 Real-Time Constraints 34110.2...

10.1201/9781315218199-17 article EN 2018-10-08
Coming Soon ...