Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Institut für Informatik

Promotiomsvortag: Carl Witt

Predictive Resource Management for Scientific Workflows
  • Wann 09.09.2019 von 15:00 bis 23:59
  • Wo Humboldt-Kabinett
  • iCal

Scientific workflows enable large-scale data analyses and computational experiments. To achieve scalability, most workflow management systems are designed as an additional layer on top of distributed resource managers such as batch schedulers or distributed data processing frameworks. Workflow management systems are already capable of transparently negotiating resources with different resource managers. They do not, however, automatically determine the amount of resources required for executing a specific task in a workflow. While estimating peak resource usage can be delegated to the user, it impedes ease-of-use and often leads to low resource utilization because users lack the time, expertise or incentives to accurately estimate resource usage.

This thesis is an investigation of how resource usage can be learned during workflow execution. In contrast to prior work, an integrated perspective on learning and scheduling is taken, which introduces various challenges, such as changing prediction accuracy over time.

The main contributions are: 1) A static workflow scheduling method is proposed, based on randomly sampling topological orders and predicting promising regions in the search space. 2) To support both static and dynamic workflow scheduling, a survey of predictive perfor­mance modeling techniques is conducted, providing an overview of approaches to black-box modeling of resource usage in distributed batch processing environments. 3) To fill a gap in the prediction literature regarding the prediction of peak memory usage, a regression-based approach to reduce memory wastage is proposed and evaluated in a case study comprising large-scale high-energy physics workflows. 4) An evaluation of the potential of feedback-based resource allocation, based on extensive simulation studies, is conducted. Results shows strong interaction effects between the resource usage prediction models and workflow scheduling, suggesting a holistic approach to workflow scheduling and resource usage prediction.

The proposed designs are important steps towards adaptively and automatically managing resources in work.ow management systems, improving the ease-of-use of these systems and thus their adoption, which ultimately benefits reproducibility and scalability of experiments, and the productivity of scientists