I’m very proud to announce a new major version of my workflow management system Steep! This is the biggest release so far. It contains many cool new features, bug fixes, but also some breaking changes. Make sure to have a look at the detailed list below and read the documentation on the Steep website.
Steep is a scientific workflow management system that can execute data-driven workflows in the Cloud. It’s very well suited to harness the possibilities of distributed computing in order to parallelise work and to speed up your data processing workflows, no matter how complex they are and regardless of how much data you need to process. Steep is an open-source software developed at Fraunhofer IGD. You can download the binaries and the source code of Steep from its GitHub repository.
Highlights in this release
- Improved workflow syntax
- Improved workflow validation
- Eager process chain generation (improved parallelization)
- Named workflows
- Workflow priorities
- Full-text search
- Improved performance (Database + Web UI + HTTP + Event bus)
- Improved workflow model:
- Support input parameters with values instead of pre-defined variables
- Allow output variables to be used without declaring them in
- Enable full parallelisation of process chains and for-each actions:
- Process chains are now generated eagerly. This means as soon as the first results of running process chains are available, new process chains can be generated.
- For-each actions are now eagerly unrolled. This means that execution of for-each actions that depend on the output of other actions or that are recursive can now start even if the results are only partially available yet.
- Add possibility to prioritize workflows
- Add powerful full-text search for workflows and process chains
- Improved workflow validation:
- Disallow variables to be used more than once as output
- Make sure enumerators cannot be reused as enumerators or outputs
- Validate if variable values are accessed within the right scope
- Display path to error in validation result
- Add dependency injection mechanism for plugin interfaces
- Add process chain consistency checker plugin
- Add plugin versions
- Add more Prometheus metrics
- Display timeout policies in UI
- Display workflow name in UI
- Display original YAML source on workflow detail page in UI if available
- Add button to create new workflow to UI
- Improved performance:
- Database requests (fewer requests and faster queries)
- HTTP API
- Web UI
- Cluster communication
- Implement timeout for creating a VM
- Compress large messages sent over the event bus
- Add shortcut button to create new workflow from scratch to UI
- Add more parameters for cluster configuration:
- Add possibility to configure placement group name
- Add possibility to make a Steep instance a Hazelcast lite member
- Improve reliability by increasing backup count of distributed Hazelcast data structures
- Hazelcast has been updated. Steep 6 instances cannot connect to Steep 5.x instances. You have to restart your whole cluster during update.
- Workflow API version 3.x has been removed. Please upgrade your workflows to API version 4.x.
- All model properties are now camel case. For example,
data_typein service metadata has been renamed to
dataType. The same applies to properties such as
file_suffix. Please refer to the documentation on the Steep website for more information.
- The deprecated property
supportedServiceIdin plugin descriptors has been removed. It was replaced by
supportedServiceIdsin earlier versions already.
- Deprecated plugin interfaces have been removed
Executable.serviceIdis now mandatory. Make sure to update your plugins if you create
Executableobjects (this particularly applies to process chain adapters)
- The deprecated configuration property
onlyTraverseDirectoryOutputshas been removed. Steep now always only traverses output directories if the
dataTypein the service metadata is
directory. All other data types will not be traversed but directly passed to the subsequent service.
- Remove deprecated store actions and action parameters
- Configuration items denoting periods of time have been renamed. All items must now be specified as durations. For example,
lookupInterval: 2s. Refer to the documentation for an overview of all configuration items.
- Configuration item
steep.http.cors.maxAgehas been renamed to
steep.http.cors.maxAgeSecondsto make clear that it has to be specified in seconds and to be in line with the corresponding CORS HTTP header.
- Fix issue where some Docker containers were not killed when the workflow was cancelled
- Do not fail if plugin configuration file is empty
- Fix reading arbitrarily large GridFS files from MongoDB
- Do not create more VMs if there already are enough providing a given required capability set
- Fix intermittent crashes in UI if connection to event bus was lost
- When looking for orphaned process chains, the scheduler does not send a message to itself anymore
- Upgrade Vert.x to 4.3.0
- Switch to Zulu OpenJDK Docker base image
- Install security patches on Docker image build
- Update other dependencies
Posted by Michel Krämer
on 27 June 2022
I’ve just released a new version of my scientific workflow management system Steep. It introduces live process chain logs, improved VM management, and many other new features. This post summarises all changes.
I’m thrilled to announce the new version of the scientific workflow management system Steep. This release contains many features including the possibility to resume process chains after a scheduler instance has crashed.
The new version of my scientific workflow management system highlights automatic retrying of individual services, multiple agents per Steep instance, an optimised scheduling algorithm, and many other new features.