- Parallel Computing and Optimization Techniques
- Embedded Systems Design Techniques
- Low-power high-performance VLSI design
- Advanced Data Storage Technologies
- Interconnection Networks and Systems
- Topic Modeling
- Cloud Computing and Resource Management
- AI in cancer detection
- Caching and Content Delivery
- Cell Image Analysis Techniques
- Image Processing Techniques and Applications
- Advanced Neural Network Applications
- Magnetic properties of thin films
- Advancements in Semiconductor Devices and Circuit Design
- Quantum-Dot Cellular Automata
- Distributed and Parallel Computing Systems
- Advanced MEMS and NEMS Technologies
- Traffic Prediction and Management Techniques
- Machine Learning and Data Classification
- Big Data and Business Intelligence
- Nanowire Synthesis and Applications
- Natural Language Processing Techniques
- VLSI and FPGA Design Techniques
- Data Quality and Management
- Metaheuristic Optimization Algorithms Research
Intel (United States)
2011-2019
Intel (United Kingdom)
2016-2018
University of North Carolina at Charlotte
2006-2012
North Carolina State University
2007
We present a novel storage manager for multi-dimensional arrays that arise in scientific applications, which is part of larger data management system called TileDB. In contrast to existing solutions, TileDB optimized both dense and sparse arrays. Its key idea organize array elements into ordered collections fragments. Each fragment or sparse, groups contiguous tiles fixed capacity. The organization fragments turns random writes sequential writes, and, coupled with read algorithm, leads very...
In this work, we quantize a trained Transformer machine language translation model leveraging INT8/VNNI instructions in the latest Intel$^\circledR$ Xeon$^\circledR$ Cascade Lake processors to improve inference performance while maintaining less than 0.5$\%$ drop accuracy. To best of our knowledge, is first attempt industry model. This has high impact as it clearly demonstrates various complexities quantizing We present novel quantization techniques directly TensorFlow opportunistically...
Energy efficiency has become the center of attention in emerging data infrastructures as increasing energy costs continue to outgrow all other operating expenditures. In this work we investigate aware scheduling heuristics increase MapReduce workloads on heterogeneous Hadoop clusters comprising both low power (wimpy) and high performance (brawny) nodes. We first make a case for heterogeneity by showing that Intel Atom processors Sandy Bridge are more efficient I/O bound CPU workloads,...
Traditional FPGA education either involves a physical laboratory room with workstations connected to individual experimenter boards or simulation platforms. Physical labs are expensive maintain and require substantial floor space. In addition, students need be physically present in the laboratories access boards. On other hand, it is often case that platforms do not provide an adequate, in-depth understanding of concepts (such as synthesis on FPGAs). this short paper, third option - remote...
The new generation of shared memory multi-core processors with multiple parallel execution paths provides a promising hardware platform for applications high degree task-level parallelism (TLP). Genetic Algorithm (GA), widely-used evolutionary meta-heuristic optimization method, is unique candidate in this class and demonstrates significant amount explicit implicit parallelism. In paper, we present the performance characteristics GA optimizing placement problem on Sun UltraSPARC T1...
Exploring the vast microarchitectural design space of chip multiprocessors (CMPs) through traditional approach exhaustive simulations is impractical due to long simulation times and its super-linear increase with core scaling. Kernel based statistical machine learning algorithms can potentially help predict multiple performance metrics non-linear dependence on CMP parameters. In this paper, we describe evaluate a framework that uses Canonical Correlation Analysis (KCCA) power dissipation...
Despite the promising performance improvement observed in emerging many-core architectures high processors, power consumption prohibitively affects their use and marketability low-energy sectors, such as embedded network processors application specific instruction (ASIPs). While most chip architects design power-efficient by finding an optimal power-performance balance design, some sophisticated on-chip autonomous management units, which dynamically reduce voltage or frequencies of idle...
From the time and money lost sitting in congestion waiting for traffic signals to change, many people injured killed crashes each year, emissions energy consumption from our vehicles, effects of transportation on daily lives are immense. A wealth data is available help address these problems; sensors installed monitor operate roadways cell phone apps -- just over horizon connected vehicles infrastructure. However, this has yet be effectively leveraged, thus providing opportunities areas such...
Existing approaches to train neural networks that use large images require either crop or down-sample data during pre-processing, small batch sizes, split the model across devices mainly due prohibitively limited memory capacity available on GPUs and emerging accelerators. These techniques often lead longer time convergence (TTT), in some cases, lower accuracy. CPUs, other hand, can leverage significant amounts of memory. While much work has been done parallelizing network training multiple...
We present an automated design flow for minimizing the use of diodes and switches (active devices) in implementations on a nanofabric based chemically self-assembled electronic nanotechnology as proposed Goldstein Budiu [2001]. Connectivity logic are realized using switch diode behaviors molecular devices, unlike very large scale integrated (VLSI) circuits where complementary metal-oxide semiconductor (CMOS) gates used. Similar to optimization goal reducing number VLSI designs minimize area,...
The focus of this work is to identify data partitioning strategies and their performance models for memory intensive two dimensional Magneto-Static Wave (MSW) calculations shared architecture. We have constructed computing, communication synchronization time the different schemes. identified that improved any scheme can be achieved by reduced boundary sharing, decreasing stride penalties, requirement increased sharing. A maximum speed-up 3.9 largest size observed one - partitioning.
Arindam Mukherjee, Arun Ravindran, Bharat Kumar Joshi, Kushal Datta and Yue LiuElectrical Computer Engineering Department University of North Carolina Charlotte, NC, USA {amukherj, aravindr, bsjoshi, kdatta, yliu42}@uncc.edu10.1 Introduction . 33810.1.1 Why Is Autonomous Power Management Necessary? 33910.1.1.1 Sporadic Processing Requirements 33910.1.1.2 Run-time Monitoring System Parameters 34010.1.1.3 Temperature 34010.1.1.4 Power/Ground Noise 34110.1.1.5 Real-Time Constraints 34110.2...