Steep 5.7.0

I’m very happy to announce that the new version of my scientific workflow management system Steep has just been released. The new version introduces live process chain logs, improved VM management, and many other things (see complete list below). The version has been thoroughly tested in practise over the last couple of months.

Steep is a scientific workflow management system that can execute data-driven workflows in the Cloud. It is very well suited to harness the possibilities of distributed computing in order to parallelise work and to speed up your data processing workflows, no matter how complex they are and regardless of how much data you need to process. Steep is an open-source software developed at Fraunhofer IGD. You can download the binaries and the source code of Steep from its GitHub repository.

Log file configuration

It is now possible to configure Steep’s main log file and to enable process chain logs. For this, add the following lines to your steep.yaml file:

steep:
  logs:
    level: INFO
    main:
      enabled: true
      logFile: logs/steep.log
      dailyRollover:
        enabled: true
        maxDays: 7
        maxSize: 104857600  # 100 MB
    processChains:
      enabled: true
      path: logs/processchains
      groupByPrefix: 3

The main log file and process chain logs can be enabled separately. dailyRollover controls if the main log file should be split into smaller files on a daily basis. It this feature is enabled, the main log file will be renamed every day. The file name of old log will be based on the value of logFile and the file’s date in the form YYYY-MM-DD (e.g. logs/steep.2020-11-19.log). maxDays and maxSize control the maximum number as well as the maximum total size of all log files. Log files beyond these limits will automatically be deleted.

You can also enable separate process chain logs that will record the output of process chain executables. Steep creates a new log file per executed process chain in the given path. Since this can lead to a high number of files, you can specify a groupByPrefix. If this prefix has a value greater than 0, the process chain log files will be grouped by prefix in subdirectories under the configured path. For example, if groupByPrefix equals 3, Steep will create a separate subdirectory for all process chains whose ID start with the same three characters. The name of this subdirectory will be these three characters. The process chains apomaokjbk3dmqovemwa and apomaokjbk3dmqovemsq will be put into a subdirectory called apo, and the process chain ao344a53oyoqwhdelmna will be put into ao3. Note that in practice, 3 is a reasonable value, which will create a new directory about every day. A value of 0 disables grouping. The default value is 0.

Process chain logs can also be accessed through the new HTTP endpoint /logs/processchains/<id>.

Real-time process chain logs

If process chain logs are enabled, you will be able to follow the output of executables in Steep’s web UI. Open the UI in your browser and navigate to the process chain you want to monitor. You will see a new “Log” section like in the image below. The log automatically updates while the process chain is being executed, so you can follow updates in real time.

Improved VM management

It is now possible to attach additional volumes to VMs created by Steep’s cloud manager. Also, the cloud manager can now create multiple VMs in parallel if you specify the maxCreateConcurrent parameter in the VM’s setup. Note that the cloud manager will only create as many VMs as necessary for the workflows currently being executed, which helps save resources.

Other new features

Here’s a list of other noteworthy improvements and bug fixes:

  • New runtime plugin interface, which allows the output of executables to be logged immediately. The old interface is still available but deprecated. It will be removed in Steep 6.0.0.
  • Sort services in UI alphabetically
  • Log caller location on retry

Maintenance

  • Update Gradle to 6.8.3
  • Update Kotlin to 1.4.30
  • Minor dependency updates
  • Update testcontainers to fix CI build

Bug fixes

  • Do not retry an executable if the process chain has been cancelled
  • Allow process chain to be cancelled even if agent currently waits for retry

Profile image of Michel Krämer

Posted by Michel Krämer
on 3 March 2021


Next post

Sudocle: A modern web app for Sudoku

As a huge fan of Sudoku, I’m extremely happy to announce the first version of Sudocle, a web app inspired by “Cracking the Cryptic”. The app is lightweight and has a clean look, which makes solving Sudoku puzzles more fun than ever!

Previous post

Steep 5.6.0

The new version of my scientific workflow management system highlights automatic retrying of individual services, multiple agents per Steep instance, an optimised scheduling algorithm, and many other new features.

Related posts

Steep 6.0.0

The new version of the scientific workflow management system contains many new features including an improved workflow syntax, better parallelization, workflow priorities, and full-text search. It also fixes a few bugs.

Steep - Run Scientific Workflows in the Cloud

I’m thrilled to announce that the workflow management system I’ve been working on for the last couple of years is now open-source! Read more about Steep and its features in this blog post.

Steep 5.8.0

I’m thrilled to announce the new version of the scientific workflow management system Steep. This release contains many features including the possibility to resume process chains after a scheduler instance has crashed.