- Scientific Computing and Data Management
- Research Data Management Practices
- Distributed and Parallel Computing Systems
- Semantic Web and Ontologies
- Advanced Data Storage Technologies
- Advanced Text Analysis Techniques
- Cloud Computing and Resource Management
- Biomedical Text Mining and Ontologies
- Topic Modeling
- Software System Performance and Reliability
- Explainable Artificial Intelligence (XAI)
- Web Data Mining and Analysis
- Data Quality and Management
- Machine Learning in Materials Science
- Business Process Modeling and Analysis
- Computational Drug Discovery Methods
- Distributed systems and fault tolerance
- Geological Modeling and Analysis
- Anomaly Detection Techniques and Applications
- Natural Language Processing Techniques
- Advanced Computational Techniques and Applications
- Big Data and Business Intelligence
- Service-Oriented Architecture and Web Services
- Genetics, Bioinformatics, and Biomedical Research
- Environmental Monitoring and Data Management
Sandia National Laboratories
2024-2025
Brookhaven National Laboratory
2017-2024
Texas State University
2022
Argonne National Laboratory
2022
Michigan State University
2021
Purdue University West Lafayette
2014-2016
Oak Ridge National Laboratory
2000-2014
National Center for Supercomputing Applications
2005
Knoxville College
1997
University of Tennessee at Knoxville
1997
We present a supercomputer-driven pipeline for in silico drug discovery using enhanced sampling molecular dynamics (MD) and ensemble docking. Ensemble docking makes use of MD results by compound databases into representative protein binding-site conformations, thus taking account the dynamic properties binding sites. also describe preliminary obtained 24 systems involving eight proteins proteome SARS-CoV-2. The involves temperature replica exchange sampling, making massively parallel...
Recent trends within computational and data sciences show an increasing recognition adoption of workflows as tools for productivity reproducibility that also democratize access to platforms processing know-how. As digital objects be shared, discovered, reused, benefit from the FAIR principles, which stand Findable, Accessible, Interoperable, Reusable. The Workflows Community Initiative's Working Group (WCI-FW), a global open community researchers developers working with across disciplines...
Understanding the Earth's climate system and how it might be changing is a preeminent scientific challenge. Global models are used to simulate past, present, future climates, experiments executed continuously on an array of distributed supercomputers. The resulting data archive, spread over several sites, currently contains upwards 100 TB simulation growing rapidly. Looking toward mid-decade beyond, we must anticipate prepare for research holdings many petabytes. Earth System Grid (ESG)...
The increase of the complexity and advancement in ecological environmental sciences encourages scientists across world to collect data from multiple places, times, thematic scales verify their hypotheses. Accumulated over time, such not only increases amount, but also diversity sources spread around world. This poses a huge challenge for who have manually search information. To alleviate problems, ONEMercury has recently been implemented as part DataONE project serve portal accessing...
As science becomes more data-intensive and collaborative, researchers increasingly use larger complex data to answer research questions. The capacity of storage infrastructure, the increased sophistication deployment sensors, ubiquitous availability computer clusters, development new analysis techniques, collaborations allow address grand societal challenges in a way that is unprecedented. In parallel, repositories have been built host response requirements sponsors be publicly available....
dPIDs are an emerging PID technology based on decentralized architectures and self-sovereign identity [1]. containers, forming persistent storage systems where each object is identified by a unique PID. immune to content drift resolves deterministically their mapped content, providing reproducible binding between the (meta)data identifier. As take net-work protocol approach PIDs, implementation of FDOF recommendations may require further explanation [2]. This presentation primer technologies...
Chimbuko is the first in situ, scalable, workflow-level performance analysis tool for trace-level and visualization of application performance. This was developed by Co-design Center Online Data Analysis Reduction funded U.S. Department Energy’s Exascale Computing Project. We provide a detailed description Chimbuko’s architecture illustrate our online offline with multiple use cases. also present results deployment scalability as applied to high-energy physics workflow running at large scale...
We present a supercomputer-driven pipeline for
The integrity of science and engineering research is grounded in assumptions rigor transparency on the part those engaging such research. HPC community effort to strengthen take form reproducibility efforts. In a recent survey SC conference community, we collected information about initiative activities. We present results this article. Results show that activities have contributed higher levels awareness technical program participants, hint at contributing greater scientific impact for...
In the emerging world of Grid Computing, shared computational, data, other distributed resources are becoming available to enable scientific advancement through collaborative research and collaboratories. This paper describes increasing role ontologies in context Computing for obtaining, comparing analyzing data. We present ontology entities a declarative model that provide outline an information. Relationships between concepts also given. The implementation some described this is discussed...
We propose an approach for improved reproducibility that includes capturing and relating provenance characteristics performance metrics. discuss two use cases: scientific of results in the Energy Exascale Earth System Model (E3SM—previously ACME) molecular dynamics workflows on HPC platforms. To capture persist data these workflows, we have designed developed Chimbuko ProvEn frameworks. captures enables detailed single workflow analysis. is a hybrid, queryable system storing analyzing...
A growing disparity between supercomputer computation speeds and I/O rates means that it is rapidly becoming infeasible to analyze application output only after has been written a file system. Instead, data-generating applications must run concurrently with data reduction and/or analysis operations, which they exchange information via high-speed methods such as interprocess communications. The resulting parallel computing motif, online (ODAR), important implications for both HPC systems...
Due to the sheer volume of data it is typically impractical analyze detailed performance an HPC application running at-scale. While conventional small-scale benchmarking and scaling studies are often sufficient for simple applications, many modern workflow-based applications couple multiple elements with competing resource demands complex inter-communication patterns which cannot easily be studied in isolation at small scale. This work discusses Chimbuko, a analysis framework that provides...
In January 2019, the US Department of Energy, Office Science program in Advanced Scientific Computing Research, convened a workshop to identify priority research directions (PRDs) for situ data management (ISDM). A fundamental finding this is that methodologies used manage among variety tasks can be facilitate scientific discovery from many different sources—simulation, experiment, and sensors, example—and being able do so at numerous computing scales will benefit real-time decision-making,...
Recent trends within computational and data sciences show an increasing recognition adoption of workflows as tools for productivity, reproducibility, democratized access to platforms processing know-how. As digital objects be shared, discovered, reused, benefit from the FAIR principles, which stand Findable, Accessible, Interoperable, Reusable. The Workflows Community Initiative's Working Group (WCI-FW), a global open community researchers developers working with across disciplines domains,...
The capability to replicate the predictions by machine learning (ML) or artificial intelligence (AI) models and results in scientific workflows that incorporate such ML/AI is driven a variety of factors.
We have developed the manufacturing agent-based emulation system as an open framework for design and analysis of discrete systems. MABES currently supports transition from traditional to lean in two major functions: alternative scheduling control approaches that can be implemented across extended enterprise; real-time collaboration teams during line stages. bases its support these functions on paradigms: distributed agents synchronous collaboration.