Steep 6.0.0
I’m very proud to announce a new major version of my workflow management system Steep! This is the biggest release so far. It contains many cool new features, bug fixes, but also some breaking changes. Make sure to have a look at the detailed list below and read the documentation on the Steep website.
Steep is a scientific workflow management system that can execute data-driven workflows in the Cloud. It’s very well suited to harness the possibilities of distributed computing in order to parallelise work and to speed up your data processing workflows, no matter how complex they are and regardless of how much data you need to process. Steep is an open-source software developed at Fraunhofer IGD. You can download the binaries and the source code of Steep from its GitHub repository.
Highlights in this release
- Improved workflow syntax
- Improved workflow validation
- Eager process chain generation (improved parallelization)
- Named workflows
- Workflow priorities
- Full-text search
- Improved performance (Database + Web UI + HTTP + Event bus)
New features
- Improved workflow model:
- Support input parameters with values instead of pre-defined variables
- Allow output variables to be used without declaring them in
vars
- Enable full parallelisation of process chains and for-each actions:
- Process chains are now generated eagerly. This means as soon as the first results of running process chains are available, new process chains can be generated.
- For-each actions are now eagerly unrolled. This means that execution of for-each actions that depend on the output of other actions or that are recursive can now start even if the results are only partially available yet.
- Add possibility to prioritize workflows
- Add powerful full-text search for workflows and process chains
- Improved workflow validation:
- Disallow variables to be used more than once as output
- Make sure enumerators cannot be reused as enumerators or outputs
- Validate if variable values are accessed within the right scope
- Display path to error in validation result
- Add dependency injection mechanism for plugin interfaces
- Add process chain consistency checker plugin
- Add plugin versions
- Add more Prometheus metrics
- Display timeout policies in UI
- Display workflow name in UI
- Display original YAML source on workflow detail page in UI if available
- Add button to create new workflow to UI
- Improved performance:
- Database requests (fewer requests and faster queries)
- HTTP API
- Web UI
- Cluster communication
- Implement timeout for creating a VM
- Compress large messages sent over the event bus
- Add shortcut button to create new workflow from scratch to UI
- Add more parameters for cluster configuration:
- Add possibility to configure placement group name
- Add possibility to make a Steep instance a Hazelcast lite member
- Improve reliability by increasing backup count of distributed Hazelcast data structures
Breaking changes
- Hazelcast has been updated. Steep 6 instances cannot connect to Steep 5.x instances. You have to restart your whole cluster during update.
- Workflow API version 3.x has been removed. Please upgrade your workflows to API version 4.x.
- All model properties are now camel case. For example,
data_type
in service metadata has been renamed todataType
. The same applies to properties such asrequired_capabilities
orfile_suffix
. Please refer to the documentation on the Steep website for more information. - The deprecated property
supportedServiceId
in plugin descriptors has been removed. It was replaced bysupportedServiceIds
in earlier versions already. - Deprecated plugin interfaces have been removed
Executable.serviceId
is now mandatory. Make sure to update your plugins if you createExecutable
objects (this particularly applies to process chain adapters)- The deprecated configuration property
onlyTraverseDirectoryOutputs
has been removed. Steep now always only traverses output directories if thedataType
in the service metadata isdirectory
. All other data types will not be traversed but directly passed to the subsequent service. - Remove deprecated store actions and action parameters
- Configuration items denoting periods of time have been renamed. All items must now be specified as durations. For example,
lookupIntervalMilliseconds: 2000
becomeslookupInterval: 2s
. Refer to the documentation for an overview of all configuration items. - Configuration item
steep.http.cors.maxAge
has been renamed tosteep.http.cors.maxAgeSeconds
to make clear that it has to be specified in seconds and to be in line with the corresponding CORS HTTP header.
Bug fixes
- Fix issue where some Docker containers were not killed when the workflow was cancelled
- Do not fail if plugin configuration file is empty
- Fix reading arbitrarily large GridFS files from MongoDB
- Do not create more VMs if there already are enough providing a given required capability set
- Fix intermittent crashes in UI if connection to event bus was lost
- When looking for orphaned process chains, the scheduler does not send a message to itself anymore
Maintenance
- Upgrade Vert.x to 4.3.0
- Switch to Zulu OpenJDK Docker base image
- Install security patches on Docker image build
- Update other dependencies
Posted by Michel Krämer
on 27 June 2022
Next post
A cloud-based data processing and visualization pipeline for the fibre roll-out in Germany
Our new paper summarizes the work we’ve done together with Deutsche Telekom during the last six and a half years. We’ve built a cloud-based platform that speeds up the roll-out of fibre broadband Internet.
Previous post
Spamihilator 1.7 has been released!
Eight years after the last update, I’m very happy to announce a new version of my free spam filter Spamihilator! This version is a maintenance release. It fixes some minor bugs and improves security.
Related posts
Steep 5.7.0
I’ve just released a new version of my scientific workflow management system Steep. It introduces live process chain logs, improved VM management, and many other new features. This post summarises all changes.
Steep 5.8.0
I’m thrilled to announce the new version of the scientific workflow management system Steep. This release contains many features including the possibility to resume process chains after a scheduler instance has crashed.
Steep 5.6.0
The new version of my scientific workflow management system highlights automatic retrying of individual services, multiple agents per Steep instance, an optimised scheduling algorithm, and many other new features.