- Scientific Computing and Data Management
- Distributed and Parallel Computing Systems
- Advanced Data Storage Technologies
- Cloud Computing and Resource Management
- Computational Physics and Python Applications
- Machine Learning in Materials Science
- Computational Drug Discovery Methods
- Data Quality and Management
- Research Data Management Practices
- Parallel Computing and Optimization Techniques
- Gamma-ray bursts and supernovae
- Privacy-Preserving Technologies in Data
- Advanced Database Systems and Queries
- Big Data Technologies and Applications
- Algorithms and Data Compression
- X-ray Diffraction in Crystallography
- vaccines and immunoinformatics approaches
- SARS-CoV-2 detection and testing
- Advanced Neuroimaging Techniques and Applications
- Cell Image Analysis Techniques
- Neurological and metabolic disorders
- Cardiovascular Syncope and Autonomic Disorders
- Astronomy and Astrophysical Research
- Bioinformatics and Genomic Networks
- Protein Structure and Dynamics
University of Chicago
2016-2024
Argonne National Laboratory
2016-2023
University of Illinois Chicago
2019-2022
NanoTechLabs (United States)
2021
High-level programming languages such as Python are increasingly used to provide intuitive interfaces libraries written in lower-level and for assembling applications from various components. This migration towards orchestration rather than implementation, coupled with the growing need parallel computing (e.g., due big data end of Moore's law), necessitates rethinking how parallelism is expressed programs. Here, we present Parsl, a scripting library that augments simple, scalable, flexible...
Despite the recent availability of vaccines against acute respiratory syndrome coronavirus 2 (SARS-CoV-2), search for inhibitory therapeutic agents has assumed importance especially in context emerging new viral variants. In this paper, we describe discovery a novel noncovalent small-molecule inhibitor, MCULE-5948770040, that binds to and inhibits SARS-Cov-2 main protease (Mpro) by employing scalable high-throughput virtual screening (HTVS) framework targeted compound library over 6.5...
Abstract We describe the simulated sky survey underlying second data challenge (DC2) carried out in preparation for analysis of Vera C. Rubin Observatory Legacy Survey Space and Time (LSST) by LSST Dark Energy Science Collaboration (LSST DESC). Significant connections across multiple science domains will be a hallmark LSST; DC2 program represents unique modeling effort that stresses this interconnectivity way has not been attempted before. This encompasses full end-to-end approach: starting...
While the Machine Learning (ML) landscape is evolving rapidly, there has been a relative lag in development of "learning systems" needed to enable broad adoption. Furthermore, few such systems are designed support specialized requirements scientific ML. Here we present Data and Hub for science (DLHub), multi-tenant system that provides both model repository serving capabilities with focus on applications. DLHub addresses two significant shortcomings current systems. First, its self-service...
Exploding data volumes and velocities, new computational methods platforms, ubiquitous connectivity demand approaches to computation in the sciences. These must enable be mobile, so that, for example, it can occur near data, triggered by events (e.g., arrival of data), offloaded specialized accelerators, or run remotely where resources are available. They also require design which monolithic applications decomposed into smaller components, that may turn executed separately on most suitable...
Protein-ligand docking is a computational method for identifying drug leads. The capable of narrowing vast library compounds down to tractable size downstream simulation or experimental testing and widely used in discovery. While there has been progress accelerating scoring with artificial intelligence, few works have bridged these successes back the virtual screening community terms utility forward-looking development. We demonstrate power high-speed ML models by 1 billion molecules under...
Scientific applications that involve simulation ensembles can be accelerated greatly by using experiment design methods to select the best simulations perform. Methods use machine learning (ML) create proxy models of show particular promise for guiding but are challenging deploy because need coordinate dynamic mixes and tasks. We present Colmena, an open-source Python framework allows users steer campaigns providing just implementations individual tasks plus logic used choose which execute...
Scientific workflows have been used almost universally across scientific domains, and underpinned some of the most significant discoveries past several decades. Many these high computational, storage, and/or communication demands, thus must execute on a wide range large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions be managed using software infrastructure. Due popularity workflows, workflow management systems (WMSs)...
The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2–3 billion to deliver one new drug. This is both too expensive slow, especially emergencies like COVID-19 pandemic. In silico methodologies need be improved select better lead compounds, so as improve efficiency of later stages protocol, identify those compounds more quickly. No known methodological approach can this combination higher quality speed. Here, we describe an...
Despite the recent availability of vaccines against acute respiratory syndrome coronavirus 2 (SARS-CoV-2), search for inhibitory therapeutic agents has assumed importance especially in context emerging new viral variants. In this paper, we describe discovery a novel non-covalent small-molecule inhibitor, MCULE-5948770040, that binds to and inhibits SARS-Cov-2 main protease (M pro ) by employing scalable high throughput virtual screening (HTVS) framework targeted compound library over 6.5...
Unraveling the liquid structure of multicomponent molten salts is challenging due to difficulty in conducting and interpreting high-temperature diffraction experiments. Motivated by this challenge, we developed composition-transferable Gaussian approximation potential (GAP) for LiCl-KCl. A DFT-SCAN accurate GAP active-learned from only $\ensuremath{\sim}$1100 training configurations drawn 10 unique mixture compositions enriched with metadynamics. The GAP-computed structures show strong...
funcX is a distributed function as service (FaaS) platform that enables flexible, scalable, and high performance remote execution. Unlike centralized FaaS systems, decouples the cloud-hosted management functionality from edge-hosted execution functionality. funcX's endpoint software can be deployed, by users or administrators, on arbitrary laptops, clouds, clusters, supercomputers, in effect turning them into serving systems. provides single location for registering, sharing, managing both...
Advances in network technologies have greatly decreased barriers to accessing physically distributed computers. This newfound accessibility coincides with increasing hardware specialization, creating exciting new opportunities dispatch workloads the best resource for a specific purpose, rather than those that are closest or most easily accessible. We present Delta, service designed intelligently schedule function-based across set of heterogeneous computing resources. Delta implements an...
Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new counter novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome 2 (SARS-CoV-2). One promising approach is train machine learning (ML) and artificial intelligence (AI) tools screen large numbers of small molecules. As a contribution that effort, we aggregating numerous molecules from variety sources, using high-performance computing (HPC) computer diverse properties those...
Exascale computers will offer transformative capabilities to combine data-driven and learning-based approaches with traditional simulation applications accelerate scientific discovery insight. These software combinations integrations, however, are difficult achieve due challenges of coordination deployment heterogeneous components on diverse massive platforms. We present the ExaWorks project, which can address many these challenges: is leading a co-design process create workflow Software...
Applications that fuse machine learning and simulation can benefit from the use of multiple computing resources, with, for example, codes running on highly parallel supercomputers AI training inference tasks specialized accelerators. Here, we present our experiences deploying two AI-guided workflows across such heterogeneous systems. A unique aspect approach is cloud-hosted management services to manage challenging aspects cross-resource authentication authorization, function-as-a-service...
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the community has forged new relationships among domain experts, mathematical modelers, computing specialists. Computationally, however, it also revealed critical gaps in ability of researchers to exploit advanced systems. These challenging areas include gaining access scalable systems, porting models workflows sharing data varying sizes, producing results that can be reproduced...
Python is increasingly the lingua franca of scientific computing. It used as a higher level language to wrap lower-level libraries and compose scripts from various independent components. However, scaling moving programs laptops supercomputers remains challenge. Here we present Parsl, parallel scripting library for Python. Parsl makes it straightforward developers implement parallelism in by annotating functions that can be executed asynchronously parallel, scale analyses laptop thousands...
Growing data volumes and velocities are driving exciting new methods across the sciences in which analytics machine learning increasingly intertwined with research. These require approaches for scientific computing computation is mobile, so that, example, it can occur near data, be triggered by events (e.g., arrival of data), or offloaded to specialized accelerators. They also design monolithic applications decomposed into smaller components, that may turn executed separately on most...
The use and reuse of scientific data is ultimately dependent on the ability to understand what those represent, how they were captured, can be used. In many ways, are only as useful metadata available describe them. Unfortunately, due growing volumes, large distributed collaborations, a desire store for long periods time, "data lakes" quickly become disorganized lack necessary researchers. New automated approaches needed derive from files these organization discovery. Here we one such...
Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some the most significant discoveries last decade. Many these high computational, storage, and/or communication demands, thus must execute on wide range large-scale platforms, from large clouds to upcoming exascale HPC platforms. Workflows will play crucial role in data-oriented post-Moore's computing landscape as democratize application cutting-edge research techniques, computationally intensive...