Steep - Run Scientific Workflows in the Cloud
For the last couple of years, I’ve been working on a scientific workflow management system called Steep. I’m more than happy to announce that, here at Fraunhofer IGD, we decided to make Steep open-source! You can download the binaries and the source code of Steep from its GitHub repository.
A scientific workflow management system is an application that can execute data-driven workflows, which apply a series of processing services (often called actions, tasks, or microservices) to input files to produce a certain output. Steep is designed to be scalable. It can run on your laptop but also in the Cloud, in a Grid, or a Cluster and distribute the individual processing services to multiple compute nodes. As such, it is very well suited to harness the possibilities of distributed computing in order to parallelise work and to speed up your data processing workflows, no matter how complex they are and regardless of how much data you need to process.
The following is a list of Steep’s key features:
Cyclic workflow graphs
In contrast to other scientific workflow management systems, Steep supports cyclic workflow graphs without a priori runtime knowledge. Workflows are converted incrementally and on-demand to so-called process chains. This allows Steep to execute workflows that dynamically change its structure during runtime.
Steep has an optimized, capability-based scheduler that executes process chains in parallel by distributing them to multiple agents (i.e. Steep instances running in the Cloud or in a Cluster).
Crashed workflows can be resumed without loss of information—even if no database is configured.
In order to be able to integrate and execute processing services or microservices with almost arbitrary interfaces, Steep makes use of so-called service metadata. You can use this metadata to describe the interface of your existing binaries without the need to modify them and to match them to a certain processing framework.
Steep has built-in runtime environments for executable microservices that are provided as binaries or Docker images.
Plugins let you modify generated process chains, the way agents collect results, or add custom runtime environments (e.g. Python, AWS Lambda, Web Processing Services).
If you want your workflows to be persisted in a database over a long period of time, Steep offers support for MongoDB and PostgreSQL. In the default configuration, Steep requires no database and keeps everything in memory. Note that, even in this setup, individual Steep instances in your cluster share the same memory, which already provides a good fault tolerance.
HTTP and web interfaces
Steep has a REST-like HTTP interface, a web-based user interface for monitoring, and can provide metrics to Prometheus.
Its asynchronous event-driven architecture allows Steep to scale horizontally across multiple machines in your cluster and to support complex dynamic workflows with thousands of tasks.
Steep is very reliable and has been used in production for many years to execute workflows from various domains. The source code has a very high test coverage.
Today, we also released the new version 5.0.0 of Steep with the following new features:
- Process chains are now executed in the order they have been added to the registry.
- New capability-based scheduling algorithm (see below)
- Do not keep temporary process chain results in memory if not needed
- Allow users to disable scheduler
- Do not log errors while trying to connect to new VM via SSH
- Add possibility to specify a minimum number of VMs per setup
- Allow alternative setups with similar provided capabilities to be specified
- New process chain adapter plugins
One of the highlights, is the new capability-based scheduling algorithm. With the old algorithm, workflow execution could stall if there was a process chain that could not be executed because of a missing agent, even if there were other agents that would have been able to execute the remaining process chains. The new algorithm executes process chains in the order they were added to the registry and always fetches as many of them as possible from the registry if there are enough agents available that can execute them. Process chains that cannot be executed because they require capabilities none of the agents can provide are skipped but resumed immediately as soon as an agent with matching capabilities joins the cluster.
Steep was initially developed within the research project “IQmulus” (A High-volume Fusion and Analysis Platform for Geospatial Point Clouds, Coverages and Volumetric Data Sets) funded from the 7th Framework Programme of the European Commission, call identifier FP7-ICT-2011-8, under the Grant agreement no. 318787 from 2012 to 2016. It was previously called the ‘IQmulus JobManager’ or just the ‘JobManager’. Under this name, it has appeared in at least the following publications:
Documentation and getting started
I’m currently working on a comprehensive web page describing the features of Steep and how you can execute workflows with it. I will keep you updated here on this site and let you know when the web page is ready.
Posted by Michel Krämer
on 6 February 2020
My paper about GeoRocket’s software architecture has just been published in Elsevier’s SoftwareX. The article explains how GeoRocket manages very large geospatial datasets in the cloud.
The new version of my scientific workflow management system highlights automatic retrying of individual services, multiple agents per Steep instance, an optimised scheduling algorithm, and many other new features.
I’m thrilled to announce the new version of the scientific workflow management system Steep. This release contains many features including the possibility to resume process chains after a scheduler instance has crashed.
I’ve just released a new version of my scientific workflow management system Steep. It introduces live process chain logs, improved VM management, and many other new features. This post summarises all changes.