- Parallel Computing and Optimization Techniques
- Distributed and Parallel Computing Systems
- Cloud Computing and Resource Management
- Logic, programming, and type systems
- Scientific Computing and Data Management
- Advanced Data Storage Technologies
- Tensor decomposition and applications
- Security and Verification in Computing
- Data Analysis with R
- Advanced biosensing and bioanalysis techniques
- Software Testing and Debugging Techniques
- Algorithms and Data Compression
- DNA and Biological Computing
- Cellular Automata and Applications
- Distributed systems and fault tolerance
- Embedded Systems Design Techniques
- Computational Physics and Python Applications
Google (United States)
2016-2022
Purdue University West Lafayette
2011-2013
North-West University
2008-2009
This work presents MLIR, a novel approach to building reusable and extensible compiler infrastructure. MLIR addresses software fragmentation, compilation for heterogeneous hardware, significantly reducing the cost of domain specific compilers, connecting existing compilers together. facilitates design implementation code generators, translators optimizers at different levels abstraction across application domains, hardware targets execution environments. The contribution this includes (1)...
This work presents MLIR, a novel approach to building reusable and extensible compiler infrastructure. MLIR aims address software fragmentation, improve compilation for heterogeneous hardware, significantly reduce the cost of domain specific compilers, aid in connecting existing compilers together. facilitates design implementation code generators, translators optimizers at different levels abstraction also across application domains, hardware targets execution environments. The contribution...
Graphics Processing Units have emerged as powerful accelerators for massively parallel, numerically intensive workloads. The two dominant software models these devices are NVIDIA's CUDA and the cross-platform OpenCL standard. Until now, there has not been a fully open-source compiler targeting environment, hampering general architecture research making deployment difficult in datacenter or supercomputer environments. In this paper, we present gpucc, an LLVM-based, open-source, compatible...
One of the major optimizations employed in deep learning frameworks is graph rewriting. Production rely on heuristics to decide if rewrite rules should be applied and which order. Prior research has shown that one can discover more optimal tensor computation graphs we search for a better sequence substitutions instead relying heuristics. However, observe existing approaches superoptimization both production apply sequential manner. Such methods are sensitive order often only explore small...
We present a runtime framework for the execution of work-loads represented as parallel-operator directed acyclic graphs (PO-DAGs) on heterogeneous multi-core platforms. PO-DAGs combine coarse-grained parallelism at graph level with fine-grained within each node, lending naturally to exploiting intra --- and inter-processing element in identify four important criteria - Suitability, Locality, Availability Criticality (SLAC) show that all these must be considered by order achieve good...
JavaScript is the dominant language for implementing dynamic web pages in browsers. Even though it standardized, many browsers implement and browser bindings different incompatible ways. As a result, plethora of development frameworks were developed to hide cross-browser issues ease large applications. An unwelcome side-effect these that they can introduce memory leaks, despite fact garbage collected. Memory bloat major issue applications, as affects user perceived latency may even prevent...
Pipelining is a well-known approach to increasing parallelism and performance. We address the problem of software pipelining for heterogeneous parallel platforms that consist different multi-core many-core processing units. In this context, involves two key steps -- partitioning an application into stages mapping scheduling onto units platform. show inter-dependency between these critical challenge must be addressed in order achieve high propose Automatic Heterogeneous framework (AHP)...
Many computationally-intensive algorithms benefit from the wide parallelism offered by Graphical Processing Units (GPUs). However, search for a close-to-optimal implementation remains extremely tedious due to specialization and complexity of GPU architectures.
Pipelining is a well-known approach to increasing parallelism and performance. We address the problem of software pipelining for heterogeneous parallel platforms that consist different multi-core many-core processing units. In this context, involves two key steps -- partitioning an application into stages mapping scheduling onto units platform. show inter-dependency between these critical challenge must be addressed in order achieve high propose Automatic Heterogeneous framework (AHP)...
Probability and statistics with R By Maria Dolores Ugarte, Ana F. Militino, Alan Arnholt, Boca Raton, Chapman & Hall/CRC Press, 2008, xxvi + 728 pp., £46.99 or US$89.95 (hardback), ISBN 1584888...
The computational power increases over the past decades havegreatly enhanced ability to simulate chemical reactions andunderstand ever more complex transformations. Tensor contractions are fundamental building block of these simulations. These simulations have often been tied one platform and restricted in generality by interface provided user. expanding prevalence accelerators researcher demands necessitate a general approach which is not specific hardware or requires contortion algorithms...
Synchronisation errors, that is, the insertion of additional or deletion valid symbols, are most difficult class errors to correct - containing additive as a subset. A regenerating algorithm, core channel demodulator, was developed for synchronisation error correcting codes. This codes is generalisation code originally by R.R. Varshamov and G.M. Tenengolts. In this article two designs provided regenerator general design has been successfully implemented tested on purpose computer (GPC) field...