I’ve just released the new version 5.8.0 of my scientific workflow management system Steep. One of the highlights of this version is increased fault tolerance: Steep schedulers are now able to resume the work of other crashed instances when they detect that a process chain is running but not monitored anymore. Steep 5.8.0 also includes many other new features (see details below). The version has been thoroughly tested in practise over the last couple of months.
Steep is a scientific workflow management system that can execute data-driven workflows in the Cloud. It is very well suited to harness the possibilities of distributed computing in order to parallelise work and to speed up your data processing workflows no matter how complex they are and regardless of how much data you need to process. Steep is an open-source software developed at Fraunhofer IGD. You can download the binaries and the source code of Steep from its GitHub repository.
Steep’s scheduler assigns process chains to agents for execution. It then monitors the execution and finally writes the process chain results to the database.
In the past, in case a scheduler instance crashed, the results of running process chains were not collected. Instead, another Steep instance had to resume the workflow and execute the process chains that were still running while the old scheduler crashed again from the beginning.
In version 5.8.0, a new scheduler instance can detect orphaned process chains and resume monitoring. Orphaned process chains are those that are currently being executed by an agent but not being monitored by any scheduler instance in the cluster.
This feature increases fault tolerance and allows for a workflow execution without interruptions and without the necessity to unnecessarily repeat process chains.
The scheduler looks for orphaned process chains at startup and in a set interval. Please have a look at Steep’s documentation for information on how to configure this new feature.
The interface for progress estimator plugins in the new version now supports multiple service IDs. This means you can use the same plugin to determine the progress of more than one service:
- name: myGenericProgressEstimator type: progressEstimator scriptFile: conf/plugins/myGenericProgressEstimator.kt supportedServiceIds: - myService - anotherService
In addition, it is now possible to declare dependencies between plugins, for example, if a plugin should be executed after another one. This applies to process chain adapters and initializers:
- name: myProcessChainAdapter type: processChainAdapter scriptFile: conf/plugins/myProcessChainAdapter.kt dependsOn: - anotherProcessChainAdapter
Finally, plugin script files can now be pre-compiled to speed up Steep’s startup time. Please read the documentation for more information.
Configuration files for setups (
setups.yaml) and service metadata (
services/services.yaml) often contain repeated information (e.g. two setups might offer the same capabilities). With the new version, you can now use YAML anchors to simplify your configuration files.
In addition, the configuration properties
steep.plugins now support glob patterns (e.g.
**/*.yaml). This allows you to, for example, recursively include other configuration files from a directory tree without having to specify each of them individually.
Provisioning scripts now support a new
upload function that can be used to upload one or more files to a virtual machine. Read the new section on provisioning scripts in Steep’s documentation for more information.
Here’s a list of other noteworthy improvements and bug fixes: