A Survey of Data-Intensive Scientific Workflow Management
[INFO.INFO-DC]Computer Science [cs]/Distributed
Multisite cloud
Scheduling
050
Scientific workflow
Parallelization
02 engineering and technology
Parallel
and Cluster Computing [cs.DC]
13. Climate action
Scientific workflow management system
Distributed and parallel data management
[INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC]
0202 electrical engineering, electronic engineering, information engineering
Grid
Cloud
DOI:
10.1007/s10723-015-9329-8
Publication Date:
2015-03-07T16:42:13Z
AUTHORS (4)
ABSTRACT
Nowadays, more and more computer-based scientific experiments need to handle massive amounts of data. Their data processing consists of multiple computational steps and dependencies within them. A data-intensive scientific workflow is useful for modeling such process. Since the sequential execution of data-intensive scientific workflows may take much time, Scientific Workflow Management Systems (SWfMSs) should enable the parallel execution of data-intensive scientific workflows and exploit the resources distributed in different infrastructures such as grid and cloud. This paper provides a survey of data-intensive scientific workflow management in SWfMSs and their parallelization techniques. Based on a SWfMS functional architecture, we give a comparative analysis of the existing solutions. Finally, we identify research issues for improving the execution of data-intensive scientific workflows in a multisite cloud.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (145)
CITATIONS (214)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....