Steep 6.0.0

I’m very proud to announce a new major version of my workflow management system Steep! This is the biggest release so far. It contains many cool new features, bug fixes, but also some breaking changes. Make sure to have a look at the detailed list below and read the documentation on the Steep website.

Steep is a scientific workflow management system that can execute data-driven workflows in the Cloud. It’s very well suited to harness the possibilities of distributed computing in order to parallelise work and to speed up your data processing workflows, no matter how complex they are and regardless of how much data you need to process. Steep is an open-source software developed at Fraunhofer IGD. You can download the binaries and the source code of Steep from its GitHub repository.

Highlights in this release

  • Improved workflow syntax
  • Improved workflow validation
  • Eager process chain generation (improved parallelization)
  • Named workflows
  • Workflow priorities
  • Full-text search
  • Improved performance (Database + Web UI + HTTP + Event bus)

New features

  • Improved workflow model:
    • Support input parameters with values instead of pre-defined variables
    • Allow output variables to be used without declaring them in vars
  • Enable full parallelisation of process chains and for-each actions:
    • Process chains are now generated eagerly. This means as soon as the first results of running process chains are available, new process chains can be generated.
    • For-each actions are now eagerly unrolled. This means that execution of for-each actions that depend on the output of other actions or that are recursive can now start even if the results are only partially available yet.
  • Add possibility to prioritize workflows
  • Add powerful full-text search for workflows and process chains
  • Improved workflow validation:
    • Disallow variables to be used more than once as output
    • Make sure enumerators cannot be reused as enumerators or outputs
    • Validate if variable values are accessed within the right scope
    • Display path to error in validation result
  • Add dependency injection mechanism for plugin interfaces
  • Add process chain consistency checker plugin
  • Add plugin versions
  • Add more Prometheus metrics
  • Display timeout policies in UI
  • Display workflow name in UI
  • Display original YAML source on workflow detail page in UI if available
  • Add button to create new workflow to UI
  • Improved performance:
    • Database requests (fewer requests and faster queries)
    • HTTP API
    • Web UI
    • Cluster communication
  • Implement timeout for creating a VM
  • Compress large messages sent over the event bus
  • Add shortcut button to create new workflow from scratch to UI
  • Add more parameters for cluster configuration:
    • Add possibility to configure placement group name
    • Add possibility to make a Steep instance a Hazelcast lite member
  • Improve reliability by increasing backup count of distributed Hazelcast data structures

Breaking changes

  • Hazelcast has been updated. Steep 6 instances cannot connect to Steep 5.x instances. You have to restart your whole cluster during update.
  • Workflow API version 3.x has been removed. Please upgrade your workflows to API version 4.x.
  • All model properties are now camel case. For example, data_type in service metadata has been renamed to dataType. The same applies to properties such as required_capabilities or file_suffix. Please refer to the documentation on the Steep website for more information.
  • The deprecated property supportedServiceId in plugin descriptors has been removed. It was replaced by supportedServiceIds in earlier versions already.
  • Deprecated plugin interfaces have been removed
  • Executable.serviceId is now mandatory. Make sure to update your plugins if you create Executable objects (this particularly applies to process chain adapters)
  • The deprecated configuration property onlyTraverseDirectoryOutputs has been removed. Steep now always only traverses output directories if the dataType in the service metadata is directory. All other data types will not be traversed but directly passed to the subsequent service.
  • Remove deprecated store actions and action parameters
  • Configuration items denoting periods of time have been renamed. All items must now be specified as durations. For example, lookupIntervalMilliseconds: 2000 becomes lookupInterval: 2s. Refer to the documentation for an overview of all configuration items.
  • Configuration item steep.http.cors.maxAge has been renamed to steep.http.cors.maxAgeSeconds to make clear that it has to be specified in seconds and to be in line with the corresponding CORS HTTP header.

Bug fixes

  • Fix issue where some Docker containers were not killed when the workflow was cancelled
  • Do not fail if plugin configuration file is empty
  • Fix reading arbitrarily large GridFS files from MongoDB
  • Do not create more VMs if there already are enough providing a given required capability set
  • Fix intermittent crashes in UI if connection to event bus was lost
  • When looking for orphaned process chains, the scheduler does not send a message to itself anymore

Maintenance

  • Upgrade Vert.x to 4.3.0
  • Switch to Zulu OpenJDK Docker base image
  • Install security patches on Docker image build
  • Update other dependencies

Profile image of Michel Krämer

Posted by Michel Krämer
on 27 June 2022


Next post

A cloud-based data processing and visualization pipeline for the fibre roll-out in Germany

Our new paper summarizes the work we’ve done together with Deutsche Telekom during the last six and a half years. We’ve built a cloud-based platform that speeds up the roll-out of fibre broadband Internet.

Previous post

Spamihilator 1.7 has been released!

Eight years after the last update, I’m very happy to announce a new version of my free spam filter Spamihilator! This version is a maintenance release. It fixes some minor bugs and improves security.

Related posts

Steep 5.7.0

I’ve just released a new version of my scientific workflow management system Steep. It introduces live process chain logs, improved VM management, and many other new features. This post summarises all changes.

Steep 5.8.0

I’m thrilled to announce the new version of the scientific workflow management system Steep. This release contains many features including the possibility to resume process chains after a scheduler instance has crashed.

Steep 5.6.0

The new version of my scientific workflow management system highlights automatic retrying of individual services, multiple agents per Steep instance, an optimised scheduling algorithm, and many other new features.