- Graph Theory and Algorithms
- Parallel Computing and Optimization Techniques
- Complex Network Analysis Techniques
- Advanced Graph Neural Networks
- Interconnection Networks and Systems
- Genomics and Phylogenetic Studies
- Single-cell and spatial transcriptomics
- Machine Learning in Healthcare
- Algorithms and Data Compression
- Stochastic Gradient Optimization Techniques
- Data Visualization and Analytics
- Distributed and Parallel Computing Systems
- Bioinformatics and Genomic Networks
- Artificial Intelligence in Healthcare
- Functional Brain Connectivity Studies
- Cloud Computing and Resource Management
- Complexity and Algorithms in Graphs
- Caching and Content Delivery
- Machine Learning in Bioinformatics
- Advanced Graph Theory Research
- Error Correcting Code Techniques
- Topic Modeling
- Topological and Geometric Data Analysis
- Human-Automation Interaction and Safety
- Sparse and Compressive Sensing Techniques
Indiana University Bloomington
2019-2025
Indiana University
2019-2024
American International University-Bangladesh
2023
The University of Texas at Austin
2020
Lawrence Berkeley National Laboratory
2015-2018
Purdue University West Lafayette
2010-2014
Bangladesh University of Engineering and Technology
2007
University of Maryland, Baltimore
2006
Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity functions and activities
Biological networks capture structural or functional properties of relevant entities such as molecules, proteins genes. Characteristic examples are gene expression protein-protein interaction networks, which hold information about affinities similarities. Such have been expanding in size due to increasing scale and abundance biological data. While various clustering algorithms proposed find highly connected regions, Markov Clustering (MCL) has one the most successful approaches cluster...
Triangle counting and enumeration are important kernels that used to characterize graphs. They also compute statistics such as clustering coefficients. We provide a simple exact algorithm is based on operations sparse adjacency matrices. By parallelizing the individual matrix operations, we achieve parallel for triangle counting. The generalizable by modifying semiring underlies algebra. present new primitive, masked multiplication, can be beneficial especially case. results from an initial...
The PathoSystems Resource Integration Center (PATRIC) is one of eight Bioinformatics Centers (BRCs) funded by the National Institute Allergy and Infection Diseases (NIAID) to create a data analysis resource for selected NIAID priority pathogens, specifically proteobacteria genera Brucella, Rickettsia Coxiella, corona-, calici- lyssaviruses viruses associated with hepatitis A E. goal project provide comprehensive bioinformatics these including consistently annotated genome, proteome metabolic...
We propose a new integrated method of exploiting model, batch and domain parallelism for the training deep neural networks (DNNs) on large distributed-memory computers using minibatch stochastic gradient descent (SGD). Our goal is to find an efficient parallelization strategy fixed size P processes. inspired by communication-avoiding algorithms in numerical linear algebra. see processes as logically divided into P_r x P_c grid where dimension implicitly responsible model/domain parallelism....
Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well some linear solvers, such algebraic multigrid. The scaling of existing parallel implementations SpGEMM heavily bound by communication. Even though 3D 2.5D) have been proposed and theoretically analyzed in the flat MPI model on Erdos-Renyi matrices, those had not implemented practice their complexities general case. In this work, we present first ever implementation...
We design and develop a work-efficient multithreaded algorithm for sparse matrix-sparse vector multiplication (SpMSpV) where the matrix, input vector, output are all sparse. SpMSpV is an important primitive in emerging GraphBLAS standard workhorse of many graph algorithms including breadth-first search, bipartite matching, maximal independent set. As thread counts increase, existing can spend more time accessing matrix data structure than doing arithmetic. Our shared-memory parallel work...
Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that widely used in areas ranging from traditional numerical applications to recent big data analysis and machine learning. Although many SpGEMM algorithms have been proposed, hardware specific optimizations for multi- many-core processors are lacking detailed of their performance under various use cases matrices not available. We firstly identify mitigate multiple bottlenecks with memory management thread scheduling...
Multiplication of a sparse matrix with dense is building block an increasing number applications in many areas such as machine learning and graph algorithms. However, most previous work on parallel multiplication considered only both or operands. This paper analyzes the communication lower bounds compares costs various classic algorithms context sparse-dense matrix-matrix multiplication. We also present new communication-avoiding based 1D decomposition, called 1.5D, which - while suboptimal...
Deep learning (DL) models typically require large-scale, balanced training data to be robust, generalizable, and effective in the context of healthcare. This has been a major issue for developing DL coronavirus disease 2019 (COVID-19) pandemic, where are highly class imbalanced. Conventional approaches use cross-entropy loss (CEL), which often suffers from poor margin classification. We show that contrastive (CL) improves performance CEL, especially imbalanced electronic health records (EHR)...
In today's era of rapid advancement in technology, innovative assistive devices are transforming accessibility for visually impaired. Through the integration health technologies, embedded systems, and software engineering, Smart Assistive Stick enables people to navigate on their own. Fundamentally, an Arduino microcontroller interprets reflected signals provide real-time feedback form voice instructions or buzzer alerts. The ultrasonic sensor detects obstructions three directions (front,...
Comparing phenotypes of heterogeneous cell populations from multiple biological conditions is at the heart scientific discovery based on flow cytometry (FC). When signal measured by average expression a biomarker, standard statistical methods require that variance be approximately stabilized in to compared. Since mean and population are often correlated fluorescence-based FC measurements, preprocessing step needed stabilize within-population variances.We present variance-stabilization...
Combinatorial algorithms such as those that arise in graph analysis, modeling of discrete systems, bioinformatics, and chemistry, are often hard to parallelize. The BLAS library implements key computational primitives for rapid development combinatorial distributed-memory systems. During the decade since its first introduction, has evolved expanded significantly. This article details many technical features version 2.0, communication avoidance, hierarchical parallelism via in-node...
Computational prediction of in-hospital mortality in the setting an intensive care unit can help clinical practitioners to guide and make early decisions for interventions. As data are complex varied their structure components, continued innovation modelling strategies is required identify architectures that best model outcomes. In this work, we trained a Heterogeneous Graph Model (HGM) on electronic health record (EHR) used resulting embedding vector as additional information added...
It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and these become long, there little concurrency. In spite of this limitation, we present a new algorithm its shared-memory parallelization that achieves good scalability maximum cardinality bipartite graphs. Our searches augmenting via specialized breadth-first (BFS) from multiple source vertices, hence creating more...
We develop a fused matrix multiplication kernel that unifies sampled dense-dense and sparsedense under single operation called FusedMM. By using user-defined functions, FusedMM can capture almost all computational patterns needed by popular graph embedding GNN approaches.FusedMM is an order of magnitude faster than its equivalent kernels in Deep Graph Library. The superior performance comes from the low-level vectorized kernels, suitable load balancing scheme efficient utilization memory...
The present study was conducted to assess the growth performance, morphometric traits, muscle chemical composition and cholesterol content in four phenotypes of naked neck chicken (black, white-black, light brown dark brown). A total 320-day-old chicks, 80 from each phenotype, were randomly stratified into 20 replicates (16/replicate), according a completely randomized design. results showed higher final body weight, weight gain, better FCR both whereas time gains found be greater phenotype....
We design, implement, and evaluate algorithms for computing a matching of maximum cardinality in bipartite graph on multicore massively multithreaded computers. As computers with larger numbers slower cores dominate the commodity processor market, design to solve large problems becomes necessity. Recent work serial problem has shown that their performance is sensitive order which vertices are processed matching. In environment, imposing considered would lead loss concurrency performance. But...
Ordering vertices of a graph is key to minimize fill-in and data structure size in sparse direct solvers, maximize locality iterative improve performance algorithms. Except for naturally parallelizable ordering methods such as nested dissection, many important have not been efficiently mapped distributed-memory architectures. In this paper, we present the first-ever implementation reverse Cuthill-McKee (RCM) algorithm reducing profile matrix. Our parallelization uses two-dimensional matrix...