An efficient real-time data collection framework on petascale systems
Petascale computing
DOI:
10.1016/j.neucom.2019.06.039
Publication Date:
2019-07-09T11:24:52Z
AUTHORS (7)
ABSTRACT
Abstract High efficiency data collection remains a great challenge for HPC reliability and resilience, yet may pave the way to overcome the barrier before fault prediction. Not only for increasing scalability up to exascale systems but even for contemporary supercomputer architectures does it require substantial efforts to efficiently collect and analyze data that contains system fault information within the framework of faults prediction. In this term, the article mainly focuses on efficient data collection and data preprocessing, preferring to optimize an effective framework to improve the efficiency of data collection in petascale system. The core of our framework includes a data collection acceleration layer scheduled by H2FS, a further detailed information get by performance analysis tool, as well as a new method for log template extraction, which all attribute to a more efficient and convenient framework for the real-time data collection. Hereafter, we conducted extensive tests based on a petascale system to verify the solution, and the experimental results demonstrate to be effectiveness and scalability of our framework.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (38)
CITATIONS (2)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....