Tyler J. Skluzacek

ORCID: 0000-0003-2242-4931
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Scientific Computing and Data Management
  • Distributed and Parallel Computing Systems
  • Advanced Data Storage Technologies
  • Research Data Management Practices
  • Data Quality and Management
  • Cloud Computing and Resource Management
  • Distributed systems and fault tolerance
  • Advanced Database Systems and Queries
  • Data Management and Algorithms
  • Geographic Information Systems Studies
  • Business Process Modeling and Analysis
  • Mobile Crowdsensing and Crowdsourcing
  • Geological Modeling and Analysis
  • Environmental Monitoring and Data Management
  • Semantic Web and Ontologies
  • IoT and Edge/Fog Computing
  • Explainable Artificial Intelligence (XAI)
  • Service-Oriented Architecture and Web Services
  • Blockchain Technology Applications and Security
  • Cloud Data Security Solutions
  • Machine Learning in Materials Science
  • Machine Learning and Data Classification

Oak Ridge National Laboratory
2022-2024

Office of Scientific and Technical Information
2024

Naval Research Laboratory Information Technology Division
2023

University of Chicago
2016-2022

University of Illinois Chicago
2020-2022

Argonne National Laboratory
2017

Exploding data volumes and velocities, new computational methods platforms, ubiquitous connectivity demand approaches to computation in the sciences. These must enable be mobile, so that, for example, it can occur near data, triggered by events (e.g., arrival of data), offloaded specialized accelerators, or run remotely where resources are available. They also require design which monolithic applications decomposed into smaller components, that may turn executed separately on most suitable...

10.1145/3369583.3392683 preprint EN 2020-06-22

funcX is a distributed function as service (FaaS) platform that enables flexible, scalable, and high performance remote execution. Unlike centralized FaaS systems, decouples the cloud-hosted management functionality from edge-hosted execution functionality. funcX's endpoint software can be deployed, by users or administrators, on arbitrary laptops, clouds, clusters, supercomputers, in effect turning them into serving systems. provides single location for registering, sharing, managing both...

10.1109/tpds.2022.3208767 article EN IEEE Transactions on Parallel and Distributed Systems 2022-09-22

Growing data volumes and velocities are driving exciting new methods across the sciences in which analytics machine learning increasingly intertwined with research. These require approaches for scientific computing computation is mobile, so that, example, it can occur near data, be triggered by events (e.g., arrival of data), or offloaded to specialized accelerators. They also design monolithic applications decomposed into smaller components, that may turn executed separately on most...

10.48550/arxiv.1908.04907 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Many interesting geospatial datasets are publicly accessible on web sites and other online repositories. However, the sheer number of locations, plus a lack support for cross-repository search, makes it difficult researchers to discover integrate relevant data. We describe here early results from system, Klimatic, that aims overcome these barriers discovery use by automating tasks crawling, indexing, integrating, distributing Klimatic implements scalable crawling processing architecture uses...

10.1109/pdsw-discs.2016.010 article EN 2016-11-01

Modern large-scale scientific discovery requires multidisciplinary collaboration across diverse computing facilities, including High Performance Computing (HPC) machines and the Edge-to-Cloud continuum. Integrated data analysis plays a crucial role in discovery, especially current AI era, by enabling Responsible development, FAIR, Reproducibility, User Steering. However, heterogeneous nature of science poses challenges such as dealing with multiple supporting tools, cross-facility...

10.1109/e-science58273.2023.10254822 article EN 2023-09-25

The use and reuse of scientific data is ultimately dependent on the ability to understand what those represent, how they were captured, can be used. In many ways, are only as useful metadata available describe them. Unfortunately, due growing volumes, large distributed collaborations, a desire store for long periods time, "data lakes" quickly become disorganized lack necessary researchers. New automated approaches needed derive from files these organization discovery. Here we one such...

10.1145/3366623.3368140 article EN 2019-11-18

Many interesting geospatial datasets are publicly accessible on web sites and other online repositories. However, the sheer number of locations, plus a lack support for cross-repository search, makes it difficult researchers to discover integrate relevant data. We describe here early results from system, Klimatic, that aims overcome these barriers discovery use by automating tasks crawling, indexing, integrating, distributing Klimatic implements scalable crawling processing architecture uses...

10.5555/3019046.3019052 article EN 2016-11-13

To mitigate the effects of high-velocity data expansion and to automate organization filesystems repositories, we have developed Skluma-a system that automatically processes a target filesystem or repository, extracts content-and context-based metadata, organizes extracted metadata for subsequent use. Skluma is able extract diverse including aggregate values derived from embedded structured data; named entities latent topics buried within free-text documents; content encoded in images....

10.1109/escience.2018.00040 article EN 2018-10-01

We introduce Xtract, an automated and scalable system for bulk metadata extraction from large, distributed research data repositories. Xtract orchestrates the application of extractors to groups files, determining which apply each file and, extractor file, where execute. A hybrid computing model, built on funcX federated FaaS platform, enables balance tradeoffs between time transfer costs by dispatching task most appropriate location. Experiments a range clouds supercomputers show that can...

10.1145/3431379.3460636 article EN 2021-06-17

Scientists' capacity to make use of existing data is predicated on their ability find and understand those data. While significant progress has been made with respect publication, indeed one can point a number well organized highly utilized repositories, there remain many such repositories in which archived are poorly described thus impossible use. We present Skluma---an automated system designed process vast amounts extract deeply embedded metadata, latent topics, relationships between...

10.1145/3085504.3091116 article EN 2017-06-05
Rafael Ferreira da Silva Rosa M. Badía Venkat Bala Debbie Bard Peer‐Timo Bremer and 95 more Ian K. Buckley Silvina Caíno‐Lores Kyle Chard Carole Goble Shantenu Jha Daniel S. Katz Daniel Laney Manish Parashar Frédéric Suter Nick Tyler Thomas D. Uram İlkay Altıntaş Stefan Andersson William Arndt Juan Pedro Aznar Jonathan Bader Bartosz Baliś Chris Blanton Kelly Rosa Braghetto Aharon Brodutch Paul Brunk Henri Casanova Alba Cervera Lierta Justin Chigu Tainã Coleman Nick Collier Iacopo Colonnelli Frederik Coppens Michael R. Crusoe W. S. Cunningham Bruno de Paula Kinoshita Paolo Di Tommaso Charles Doutriaux Matthew T. Downton Wael Elwasif Bjoern Enders Christopher Erdmann Thomas Fahringer Ludmilla Figueiredo Rosa Filgueira Martin Foltín Anne Fouilloux Luiz Gadelha Andy Gallo Artur Garcia Saez Daniel Garijo Roman G. Gerlach Ryan E. Grant Samuel Grayson Patricia Grubel Johan E. Gustafsson Valérie Hayot‐Sasson Óscar Hernández Marcus Hilbrich Annmary Justine I. Laflotte Fabian Lehmann André Luckow Jakob Luettgau Ketan Maheshwari Motohiko Matsuda Doriana Medić Peter Mendygral Marek T. Michalewicz Jorji Nonaka Maciej Pawlik Loïc Pottier Line Pouchard Mathias Pütz Santosh Kumar Radha Lavanya Ramakrishnan Sasko Ristov Paul Romano Daniel Rosendo Martin Ruefenacht Katarzyna Rycerz Nishant Saurabh V. Savchenko Martin Schulz Christine M. Simpson Raúl Sirvent Tyler J. Skluzacek Stian Soiland‐Reyes Renan P. Souza Sreenivas R. Sukumar Ziheng Sun Alan Sussman Douglas Thain Mikhail Titov Benjamín Tovar Aalap Tripathy Matteo Turilli Bartosz Tużnik Hubertus J. J. van Dam Aurelio Vivas

Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on to orchestrate large and complex experiments that range from execution of a cloud-based data preprocessing pipeline multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape evolving needs emerging applications, it paramount development novel system functionalities seek increase efficiency, resilience, pervasiveness...

10.48550/arxiv.2304.00019 preprint EN cc-by arXiv (Cornell University) 2023-01-01

FAIR principles require that scientific data be findable, discoverable, and reusable by users. To enable FAIRness, practioners of a science repository will often construct rich, searchable index metadata derived from the data. Unfortunately, manual annotation methods do not scale to many files generated projects; instead automated extraction systems are needed scalably parse these files—often with nonstandard schema requiring specialized parsing strategies—and deposit representative into...

10.1109/e-science58273.2023.10254801 article EN 2023-09-25

The rapid generation of data from distributed IoT devices, scientific instruments, and compute clusters presents unique management challenges. influx large, heterogeneous, complex causes repositories to become siloed or generally unsearchable---both problems not currently well-addressed by file systems. In this work, we propose Xtract, a serverless middleware extract metadata files spread across heterogeneous edge computing resources. my future intend study how Xtract can automatically...

10.1145/3366624.3368170 article EN 2019-11-27

The advancement of science is increasingly intertwined with complex computational processes [1]. Scientific workflows are at the heart this evolution, acting as essential orchestrators for a vast range experiments. Specifically, these central to field Earth Sciences, where they orchestrate diverse activities, from cloud-based data preprocessing pipelines in environmental modeling intricate multi-facility instrument-to-edge-to-HPC frameworks seismic analysis and geophysical simulations [2]....

10.5194/egusphere-egu24-21636 preprint EN 2024-03-11

Many extreme-scale applications require the movement of large quantities data to, from, and among leadership computing facilities, as well other scientific facilities home institutions facility users. These applications, particularly when are involved, can touch upon edge cases (e.g., terabyte files) that had not been a focus previous Globus optimization work, which emphasized rather many smaller (megabyte to gigabyte) files. We report here on how automated client-driven chunking be used...

10.1177/10943420241281744 article EN The International Journal of High Performance Computing Applications 2024-09-09
Rafael Ferreira da Silva Deborah Bard Kyle Chard de Witt Shaun Ian Foster and 95 more Tom Gibbs Carole Goble William F. Godoy Johan E. Gustafsson Utz‐Uwe Haus Stephen D. Hudson Shantenu Jha Laura de los Drew Paine Frédéric Suter Logan Ward Sean Wilkinson Marcos Amarís Yadu Babuji Jonathan Bader Riccardo Balin Daniel Balouek‐Thomert Sarah Beecroft Khalid Belhajjame Rajat Bhattarai Wesley Brewer Paul Brunk Silvina Caíno‐Lores Henri Casanova Daniela Cassol Jared Coleman Tainã Coleman Iacopo Colonnelli Anderson Andrei Da Silva Daniel de Oliveira Pascal Elahi Nabil El‐Faramawy Wael Elwasif Brian D. Etz Thomas Fahringer Weder N. Ferreira Rosa Filgueira Jacob Fosso Tande Luiz Gadelha Andy Gallo Daniel Garijo Yiannis Georgiou Philipp Gritsch Patricia Grubel Amal Gueroudji Quentin Guilloteau Carlo Hamalainen R Latorre Enriquez Lauren Huet Kevin Hunter Kesling Paula Iborra Shiva Jahangiri Jan Janßen Joanne L. Jordan Sehrish Kanwal Liliane Kunstmann Fabian Lehmann Ulf Leser Chen Li Peini Liu Jakob Luettgau Richard Lupat José M. Fernández Ketan Maheshwari Tanu Malik Jack Marquez Motohiko Matsuda Doriana Medić Somayeh Mohammadi Alberto Mulone John-Luke Navarro Kin Wai Ng Klaus Noelp Bruno P. Kinoshita Ryan Prout Michael R. Crusoe Sasko Ristov Stefan A. Robila Daniel Rosendo Billy Rowell Jedrzej Rybicki Hector Sanchez Lopez Nishant Saurabh Sumit Kumar Saurav Tom Scogland Dinindu Senanayake Woong Shin Raúl Sirvent Tyler J. Skluzacek Barry Sly-Delgado Stian Soiland‐Reyes Abel Souza Renan P. Souza Domenico Talia Nathan R. Tallent

The Workflows Community Summit gathered 111 participants from 18 countries to discuss emerging trends and challenges in scientific workflows, focusing on six key areas: time-sensitive AI-HPC convergence, multi-facility heterogeneous HPC environments, user experience, FAIR computational workflows. integration of AI exascale computing has revolutionized enabling higher-fidelity models complex, processes, while introducing managing environments data dependencies. rise large language is driving...

10.5281/zenodo.13844758 preprint EN cc-by 2024-10-18
Coming Soon ...