- Parallel Computing and Optimization Techniques
- Polynomial and algebraic computation
- Numerical Methods and Algorithms
- Cryptography and Residue Arithmetic
- Matrix Theory and Algorithms
- Tensor decomposition and applications
- Distributed and Parallel Computing Systems
- Data Mining Algorithms and Applications
- Cellular Automata and Applications
- Advanced Data Storage Technologies
- Stochastic Gradient Optimization Techniques
- Energy Harvesting in Wireless Networks
- Interconnection Networks and Systems
- Hand Gesture Recognition Systems
- Human Pose and Action Recognition
- Fluid Dynamics Simulations and Interactions
- Digital Image Processing Techniques
- Advanced Image and Video Retrieval Techniques
- Advanced Database Systems and Queries
- Advanced MIMO Systems Optimization
- Algorithms and Data Compression
- Energy Efficient Wireless Sensor Networks
- Data Management and Algorithms
- Sparse and Compressive Sensing Techniques
- Handwritten Text Recognition Techniques
Liverpool John Moores University
2024
Otterbein University
2023
Alcorn State University
2020-2022
Qassim University
2018
Geomechanica (Canada)
2018
Western University
2010-2017
University of Lethbridge
2009
Matrix–matrix multiplication is of singular importance in linear algebra operations with a multitude applications scientific and engineering computing. Data structures for storing matrix elements are designed to minimize overhead information as well optimize the operation count. In this study, we utilize notion compact diagonal storage method (CDM), which builds upon previously developed storage—an orientation-independent uniform scheme store nonzero range matrices. This study exploits both...
As for serial code on CPUs, parallel GPUs dense polynomial arithmetic relies a combination of asymptotically fast and plain algorithms. Those are employed data large small size, respectively. Parallelizing both types algorithms is required in order to achieve peak performances. In this paper, we show that the multiplication can be efficiently parallelized GPUs. Remarkably, it outperforms (highly optimized) FFT-based up degree 212 while CPU same threshold usually at 26. We also report GPU...
We report on a GPU implementation of the condensation method designed by Abdelmalek Salem and Kouachi Said for computing determinant matrix. consider two types coefficients: modular integers floating point numbers. evaluate performance our code measuring its effective bandwidth argue that it is numerical stable in number case. In addition, we compare with serial computation from well-known mathematical packages. Our results suggest has large potential improving those packages terms running...
We present a model of multithreaded computation, combining fork-join and single-instruction-multiple-data parallelisms, with an emphasis on estimating parallelism overheads programs written for modern many-core architectures. establish Graham-Brent theorem this so as to estimate execution time running given number streaming multiprocessors. evaluate the benefits our four fundamental algorithms from scientific computing. In each case, is used minimize by determining appropriate value range...
Sparse matrix-vector multiplication or SpMXV is an important kernel in scientific computing. For example, the conjugate gradient method (CG) iterative linear system solving process where of coefficient matrix A with a dense vector x main computational step accounting for as much 90% overall running time. Though total number arithmetic operations (involving nonzero entries only) to compute Ax fixed, reducing probability cache misses per operation still challenging area research. This...
We revisit ordering techniques as a preprocessing step for improving the performance of sparse matrix-vector multiplication (SpM x V) on modern hierarchical memory computers. In computing SpM V main purpose columns (or rows) is to improve by enhancing data reuse. present new technique based binary reflected gray codes and experimentally evaluate compare it with other column from literature. The results numerical experiments very large test matrices clearly demonstrates gains rendered our...
Matrix-matrix multiplication is a fundamental computation in wide range of scientific problems. In this paper, we investigate algorithmic aspects important operation whereby the underlying arithmetic calculations are ordered way that different from standard triple-loop framework. We introduce notion "compact diagonal storage" which builds upon previously developed storage, an orientation-independent uniform scheme to store nonzero elements dense as well structured matrices such symmetric,...
This paper presents a novel approach to generate bag-of-words model based trajectory descriptions for handwritten strokes. We demonstrate how multiple distinct representations can be generated the same stroke accommodate writing variations and capture local features at stroke-segment level. The proposed utilized in template matching handwriting recognition/correction, writer identification, signature verification, etc. suitability of shape is experimented number settings used build language...
Relay Node Placement (RNP) is a critical design problem in Wireless Sensor Networks (WSNs).Network performance and energy efficiency can be highly affected by the placement strategy of RNs.It an NP-hard optimization which effectively addressed with multi-objective formulation solved using metaheuristics.The main aim this research work to efficiently optimize unconstrained deployment energy-harvesting RNs pre-established stationary WSN.The focus on optimizing different conflicting objectives,...
The CUDA Modular Polynomial (CUMODP) Library implements arithmetic operations for dense matrices and polynomials, primarily with modular integer coefficients. Some are available or floating point Similar to other software libraries, like CuBLAS 1 targeting Graphics Processing Units (GPUs), CUMODP focuses on efficiency-critical routines provides them in the form of device functions kernels. Hence, these designed offer GPU support polynomial system solvers. A bivariate solver is part library,...
In this paper, we proposed a cache-friendly hybrid sorting algorithm that combined non-comparison (counting sort) with comparison sort (quick algorithm. The current study leverages the principle of locality to improve performance counting over large list integer inputs. We employed modified version original quick reduce number cache misses in sort. also empirically tested by varying both range and quantity input values. This approach not only demonstrated superior classic but is capable...
One of the critical design problems in Wireless Sensor Networks (WSNs) is Relay Node Placement (RNP) problem. Inefficient deployment RNs would have adverse effects on overall performance and energy efficiency WSNs. The RNP problem a typical example an NP-hard optimization which can be addressed using metaheuristics with multi-objective formulation. In this paper, we aimed to provide efficient approach considering unconstrained energy-harvesting into pre-established stationary WSN. was...
Sparse matrix-vector multiplication or SpMxV is an important kernel in scientific computing. For example, the conjugate gradient method, where main computational step. Though total number of arithmetic operations fixed, reducing probability cache misses per operation still a challenging area research. In this work, we present new column ordering algorithm for sparse matrices. We analyze complexity when A ordered by our technique. The numerical experiments, with very large test matrices,...
This paper describes a novel algorithm that enumerates set of Boolean variables from 3-SAT instance such for any truth assignments, it will be reduced to 2-SAT or an empty formula with one more clauses. We showed the interest can found by formulating given into covering problem.
We present multithreaded adaptations of the Euclidean plain division and GCD algorithms to many-core GPU architectures report on implementation with NVIDIA CUDA complexity analysis an enhanced version PRAM model.
This paper describes a polynomial time algorithm for solving graph isomorphism and automorphism. We introduce new tree data structure called Walk Length Tree. show that such can be both constructed compared with another in time. prove automorphism solved using Trees.