I’ve just released the new version 5.8.0 of my scientific workflow management system Steep. One of the highlights of this version is increased fault tolerance: Steep schedulers are now able to resume the work of other crashed instances when they detect that a process chain is running but not monitored anymore. Steep 5.8.0 also includes many other new features (see details below). The version has been thoroughly tested in practise over the last couple of months.
Steep is a scientific workflow management system that can execute data-driven workflows in the Cloud. It is very well suited to harness the possibilities of distributed computing in order to parallelise work and to speed up your data processing workflows, no matter how complex they are and regardless of how much data you need to process. Steep is an open-source software developed at Fraunhofer IGD. You can download the binaries and the source code of Steep from its GitHub repository.
Resume monitoring of running process chains
Steep’s scheduler assigns process chains to agents for execution. It then monitors the execution and finally writes the process chain results to the database.
In the past, in case a scheduler instance crashed, the results of running process chains were not collected. Instead, another Steep instance had to resume the workflow and execute the process chains that were still running while the old scheduler crashed again from the beginning.
In version 5.8.0, a new scheduler instance can detect orphaned process chains and resume monitoring. Orphaned process chains are those that are currently being executed by an agent but not being monitored by any scheduler instance in the cluster.
This feature increases fault tolerance and allows for a workflow execution without interruptions and without the necessity to unnecessarily repeat process chains.
The scheduler looks for orphaned process chains at startup and in a set interval. Please have a look at Steep’s documentation for information on how to configure this new feature.
New features for plugins
The interface for progress estimator plugins in the new version now supports multiple service IDs. This means you can use the same plugin to determine the progress of more than one service:
- name: myGenericProgressEstimator type: progressEstimator scriptFile: conf/plugins/myGenericProgressEstimator.kt supportedServiceIds: - myService - anotherService
In addition, it is now possible to declare dependencies between plugins, for example, if a plugin should be executed after another one. This applies to process chain adapters and initializers:
- name: myProcessChainAdapter type: processChainAdapter scriptFile: conf/plugins/myProcessChainAdapter.kt dependsOn: - anotherProcessChainAdapter
Finally, plugin script files can now be pre-compiled to speed up Steep’s startup time. Please read the documentation for more information.
Configuration files for setups (
setups.yaml) and service metadata
services/services.yaml) often contain repeated information (e.g. two setups
might offer the same capabilities). With the new version, you can now
use YAML anchors to simplify
your configuration files.
In addition, the configuration properties
now support glob patterns (e.g.
**/*.yaml). This allows you to, for example,
recursively include other configuration files from a directory tree without
having to specify each of them individually.
Provisioning scripts now support a new
upload function that can be used to
upload one or more files to a virtual machine. Read the
new section on provisioning scripts
in Steep’s documentation for more information.
Other new features
Here’s a list of other noteworthy improvements and bug fixes:
- Display allocated process chain ID in agent details
- Reduce the number of log messages, in particular when agents join or leave the cluster
- Run PostgreSQL database migration only once per Steep instance
- Improve graceful shutdown
- Add possibility to restore cluster members on startup from VM registry
- Run unit tests in parallel
- Speed up MongoDB unit tests
- Update web UI dependencies
- Allow runtime plugins to access current Vert.x context
- Do not fail to delete OpenStack block device if it does not exist
Posted by Michel Krämer
on 19 May 2021
My latest paper about scheduling workflow actions based on required capabilities has just been published Springer’s Communications in Computer and Information Science book series.
Our paper has just been published in the journal of cloud computing. We present an approach for a cloud-based system executing scientific workflows whose structure may change during run time.
The new version of my scientific workflow management system highlights automatic retrying of individual services, multiple agents per Steep instance, an optimised scheduling algorithm, and many other new features.
I’m thrilled to announce that the workflow management system I’ve been working on for the last couple of years is now open-source! Read more about Steep and its features in this blog post.
I’ve just released a new version of my scientific workflow management system Steep. It introduces live process chain logs, improved VM management, and many other new features. This post summarises all changes.