Kerstin Kleese van Dam

ORCID: 0000-0002-5794-7620
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Scientific Computing and Data Management
  • Distributed and Parallel Computing Systems
  • Research Data Management Practices
  • Advanced Data Storage Technologies
  • Cloud Computing and Resource Management
  • Big Data Technologies and Applications
  • Big Data and Business Intelligence
  • Advanced Computational Techniques and Applications
  • Software System Performance and Reliability
  • Advanced Database Systems and Queries
  • Geographic Information Systems Studies
  • Environmental Monitoring and Data Management
  • Data Quality and Management
  • Advanced X-ray Imaging Techniques
  • Data Visualization and Analytics
  • Glass properties and applications
  • Semantic Web and Ontologies
  • Parallel Computing and Optimization Techniques
  • Anomaly Detection Techniques and Applications
  • Simulation Techniques and Applications
  • Data Management and Algorithms
  • Geochemistry and Geologic Mapping
  • Geological Modeling and Analysis
  • Hydrology and Watershed Management Studies
  • Cell Image Analysis Techniques

Brookhaven National Laboratory
2016-2024

Pacific Northwest National Laboratory
2011-2015

Daresbury Laboratory
2000-2008

Science and Technology Facilities Council
2000-2008

Sci-Tech Daresbury
2008

UK Carbon Capture and Research Centre
2005-2006

Today’s computational, experimental, and observational sciences rely on computations that involve many related tasks. The success of a scientific mission often hinges the computer automation these workflows. In April 2015, US Department Energy (DOE) invited diverse group domain scientists from national laboratories supported by Office Science, National Nuclear Security Administration, industry, academia to review workflow requirements DOE’s science security missions, assess current state art...

10.1177/1094342017704893 article EN The International Journal of High Performance Computing Applications 2017-04-26

Chimbuko is the first in situ, scalable, workflow-level performance analysis tool for trace-level and visualization of application performance. This was developed by Co-design Center Online Data Analysis Reduction funded U.S. Department Energy’s Exascale Computing Project. We provide a detailed description Chimbuko’s architecture illustrate our online offline with multiple use cases. also present results deployment scalability as applied to high-energy physics workflow running at large scale...

10.1177/10943420251316253 article EN The International Journal of High Performance Computing Applications 2025-03-31

In this paper, we present the Core Scientific Metadata Model (CSMD), a model for representation of scientific study metadata developed within Science & Technology Facilities Council (STFC) to represent data generated from facilities. The has been allow management and access resources facilities in uniform way, although believe that wider application, especially areas “structural science” such as chemistry, materials science earth sciences. We give some motivations behind development...

10.2218/ijdc.v5i1.146 article EN cc-by International Journal of Digital Curation 2010-06-22

Data from high-energy physics (HEP) experiments are collected with significant financial and human effort mostly unique. An inter-experimental study group on HEP data preservation long-term analysis was convened as a panel of the International Committee for Future Accelerators (ICFA). The formed by large collider-based investigated technical organisational aspects preservation. intermediate report released in November 2009 addressing general issues HEP. This paper includes extends report. It...

10.48550/arxiv.1205.4667 preprint EN other-oa arXiv (Cornell University) 2012-01-01

Scientific facilities, in particular large-scale photon and neutron sources, have demanding requirements to manage the increasing quantities of experimental data they generate a systematic secure way. In this paper, we describe ICAT infrastructure for cataloguing facility-generated which has been development within STFC DLS several years. We consider factors influenced its design architecture metadata model, key tool management data. go on give an outline current implementation use, with...

10.1109/e-science.2009.36 article EN 2009-12-01

Mass Spectrometric Imaging (MSI) allows the generation of 2D ion density maps that help visualize molecules present in sections tissues and cells. The combination spatial resolution mass results very large complex data sets. New capabilities are necessary for efficient analysis interpretation this data. This work details development application capability to process, visualize, query, analyze spectrometry Applications include selected spectra, manipulation heat maps, identification spectral...

10.1109/embc.2012.6347250 article EN Annual International Conference of the IEEE Engineering in Medicine and Biology Society 2012-08-01

Trelliscope emanates from the Trellis Display framework for visualization and Divide Recombine (D&R) approach to analyzing large complex data. In Trellis, data are broken up into subsets, a method is applied each subset, display result an array of panels, one per subset. This powerful data, both small large. D&R, any analytic statistics machine learning subset independently. Then outputs recombined. provides not only analysis, but also feasible practical computations using distributed...

10.1109/ldav.2013.6675164 article EN 2013-10-01

We propose an approach for improved reproducibility that includes capturing and relating provenance characteristics performance metrics. discuss two use cases: scientific of results in the Energy Exascale Earth System Model (E3SM—previously ACME) molecular dynamics workflows on HPC platforms. To capture persist data these workflows, we have designed developed Chimbuko ProvEn frameworks. captures enables detailed single workflow analysis. is a hybrid, queryable system storing analyzing...

10.1177/1094342019839124 article EN The International Journal of High Performance Computing Applications 2019-04-08

A growing disparity between supercomputer computation speeds and I/O rates means that it is rapidly becoming infeasible to analyze application output only after has been written a file system. Instead, data-generating applications must run concurrently with data reduction and/or analysis operations, which they exchange information via high-speed methods such as interprocess communications. The resulting parallel computing motif, online (ODAR), important implications for both HPC systems...

10.1177/10943420211023549 article EN The International Journal of High Performance Computing Applications 2021-06-12

Abstract This paper describes a prototype grid infrastructure, called the “eMinerals minigrid”, for molecular simulation scientists. which is based on an integration of shared compute and data resources. We describe key components, namely use Condor pools, Linux/Unix clusters with PBS IBM's LoadLeveller job handling tools, Globus security handling, Condor-G tools wrapping globus submit commands, Condor's DAGman tool workflow, Storage Resource Broker data, CCLRC dataportal associated both...

10.1080/08927020500067195 article EN Molecular Simulation 2005-04-01

Due to the sheer volume of data it is typically impractical analyze detailed performance an HPC application running at-scale. While conventional small-scale benchmarking and scaling studies are often sufficient for simple applications, many modern workflow-based applications couple multiple elements with competing resource demands complex inter-communication patterns which cannot easily be studied in isolation at small scale. This work discusses Chimbuko, a analysis framework that provides...

10.1145/3426462.3426465 article EN 2020-11-12

Advances in detectors and computational technologies provide new opportunities for applied research the fundamental sciences. Concurrently, dramatic increases three V's (Volume, Velocity, Variety) of experimental data scale tasks produced demand real-time processing systems at facilities. Recently, this was addressed by Spark-MPI approach connecting Spark data-intensive platform with MPI high-performance framework. In contrast existing management analytics systems, introduced a middleware...

10.1109/nysds.2017.8085039 article EN 2017-08-01

A set of new data analysis software tools have been developed for the study structural dynamics materials using coherent scattering and photon correlation techniques. The can readily process high-throughput, multidimensional data, enabling studies slow fast X-ray Speckle Visibility Spectroscopy Photon Correlation They support a wide range user expertise, from novice to developer, are available in Scikit-beam python package which is at https://github.com/scikit-beam/scikit-beam.

10.1109/nysds.2016.7747815 article EN 2016-08-01

Abstract. The Climate Science Modelling Language (CSML) has been developed by the NERC DataGrid (NDG) project as a standards-based data model and XML markup for describing constructing climate science datasets. It uses conceptual models from emerging standards in GIS to define number of feature types, adopts schemas Geography Markup (GML) where possible encoding. A prototype deployment CSML is being trialled across curated archives British Atmospheric Oceanographic Data Centres. These...

10.5194/adgeo-8-83-2006 article EN cc-by-nc-sa Advances in geosciences 2006-06-06

Scientific user facilities — particle accelerators, telescopes, colliders, supercomputers, light sources, sequencing facilities, and more operated by the U.S. Department of Energy (DOE) Office Science (SC) generate ever increasing volumes data at unprecedented rates from experiments, observations, simulations. At same time there is a growing community experimentalists that require real-time analysis feedback, to enable them steer their complex experimental instruments optimized scientific...

10.1109/escience.2016.7870902 article EN 2016-10-01

We describe RMCS as one of the first tools for grid computing that integrates data and metadata management into a single job submission system. The system is easy to use, with client are install. Although was developed prototype, it now in production use number scientific studies have been completed using it.

10.1098/rsta.2008.0159 article EN cc-by Philosophical Transactions of the Royal Society A Mathematical Physical and Engineering Sciences 2008-12-16

Abstract Emerging developments in geographic information systems and distributed computing offer a roadmap towards an unprecedented spatial data infrastructure the climate sciences. Key to this are standards for digital being led by International Organisation Standardisation (ISO) technical committee on information/geomatics (TC211) Open Geospatial Consortium (OGC). These, coupled with evolution of standardised web services applications internet World Wide Web (W3C), mean that opportunities...

10.1017/s1350482705001556 article EN Meteorological Applications 2005-03-01
Coming Soon ...