Runzhou Han

ORCID: 0000-0003-1440-7568
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Data Storage Technologies
  • Distributed and Parallel Computing Systems
  • Caching and Content Delivery
  • Cloud Computing and Resource Management
  • Distributed systems and fault tolerance
  • Research Data Management Practices
  • Scientific Computing and Data Management
  • Parallel Computing and Optimization Techniques
  • Software System Performance and Reliability
  • Error Correcting Code Techniques
  • Network Security and Intrusion Detection
  • Algorithms and Data Compression

Iowa State University
2020-2024

Samsung (United States)
2022

Data provenance, or data lineage, describes the life cycle of data. In scientific workflows on HPC systems, scientists often seek diverse provenance (e.g., origins products, usage patterns datasets). Unfortunately, existing solutions cannot address challenges due to their incompatible models and/or system implementations. this paper, we analyze four representative in collaboration with domain identify concrete needs. Based first-hand analysis, propose a framework called PROV-IO...

10.1109/tpds.2024.3374555 article EN IEEE Transactions on Parallel and Distributed Systems 2024-03-14

As core components of High-performance computing (HPC) platforms, parallel file systems (PFSes) grow quickly in scale and complexity, hence are subject to various failures anomalies. Identifying their anomalies runtime is critically helpful for HPC operators administrators. Analyzing the logs detect large-scale has been proven effective many recent studies. However, applying them faces significant challenges due large volume irregularity PFSes logs. This study proposes SentiLog, a new...

10.1145/3465332.3470873 article EN 2021-07-20

Large-scale parallel file systems (PFSs) play an essential role in high-performance computing (HPC). However, despite their importance, reliability is much less studied or understood compared with that of local storage cloud systems. Recent failure incidents at real HPC centers have exposed the latent defects PFS clusters as well urgent need for a systematic analysis. To address challenge, we perform study recovery and logging mechanisms PFSs this article. First, to trigger operations target...

10.1145/3483447 article EN ACM Transactions on Storage 2022-03-29

cData provenance, or data lineage, describes the life cycle of data. In scientific workflows on HPC systems, scientists often seek diverse provenance (e.g., origins products, usage patterns datasets). Unfortunately, existing solutions cannot address challenges due to their incompatible models and/or system implementations.

10.1145/3502181.3531477 article EN 2022-06-23

Parallel file systems (PFSes) play an essential role in high performance computing. To ensure the integrity, many PFSes are designed with a checker component, which serves as last line of defense to bring corrupted PFS back healthy state. Motivated by real-world incidents corruptions, we perform fine-grained study on capability checkers this paper. We apply type-aware fault injection specific structures, and examine detection repair policies meticulously via well-defined taxonomy. The...

10.1109/pdsw51947.2020.00013 article EN 2020-11-01

The metadata service (MDS) sits on the critical path for distributed file system (DFS) operations, and therefore it is key to overall performance of a large-scale DFS. Common "serverful" MDS architectures, such as single server or cluster servers, have significant shortcoming: either they are not scalable, make difficult achieve an optimal balance performance, resource utilization, cost. A modern requires novel architecture that addresses this shortcoming.

10.1145/3623278.3624765 article EN cc-by-nc-sa 2023-03-25

The metadata service (MDS) sits on the critical path for distributed file system (DFS) operations, and therefore it is key to overall performance of a large-scale DFS. Common "serverful" MDS architectures, such as single server or cluster servers, have significant shortcoming: either they are not scalable, make difficult achieve an optimal balance performance, resource utilization, cost. A modern requires novel architecture that addresses this shortcoming. To end, we design implement...

10.48550/arxiv.2306.11877 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Data provenance, or data lineage, describes the life cycle of data. In scientific workflows on HPC systems, scientists often seek diverse provenance (e.g., origins products, usage patterns datasets). Unfortunately, existing solutions cannot address challenges due to their incompatible models and/or system implementations. this paper, we analyze four representative in collaboration with domain identify concrete needs. Based first-hand analysis, propose a framework called PROV-IO+, which...

10.48550/arxiv.2308.00891 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01

Diagnosing storage system failures is challenging even for professionals. One example the "When Solid State Drives Are Not That Solid" incident occurred at Algolia data center, where Samsung SSDs were mistakenly blamed caused by a Linux kernel bug. With complexity keeps increasing, such obscure will likely occur more often. As one step to address challenge, we present our on-going efforts called X-Ray. Different from traditional methods that focus on either software or hardware, X-Ray...

10.48550/arxiv.2005.02547 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Many storage applications such as file system checkers, defragmentation tools, etc. require a detailed understanding of systems. Such file-system aware play an essential role today, but unfortunately they are error-prone. To better understand the challenges well opportunities to address issues, this paper presents empirical study real world bugs in applications. By analyzing 59 bug cases from 4 representative depth, we derive multiple insights terms general patterns, triggering conditions,...

10.1109/nas55553.2022.9925445 article EN 2022-10-01
Coming Soon ...