I’m very happy to announce that the new version of my scientific workflow management system Steep has just been released. The new version introduces live process chain logs, improved VM management, and many other things (see complete list below). The version has been thoroughly tested in practise over the last couple of months.
Steep is a scientific workflow management system that can execute data-driven workflows in the Cloud. It is very well suited to harness the possibilities of distributed computing in order to parallelise work and to speed up your data processing workflows no matter how complex they are and regardless of how much data you need to process. Steep is an open-source software developed at Fraunhofer IGD. You can download the binaries and the source code of Steep from its GitHub repository.
It is now possible to configure Steep’s main log file and to enable process chain logs. For this, add the following lines to your
steep: logs: level: INFO main: enabled: true logFile: logs/steep.log dailyRollover: enabled: true maxDays: 7 maxSize: 104857600 # 100 MB processChains: enabled: true path: logs/processchains groupByPrefix: 3
The main log file and process chain logs can be
dailyRollover controls if the main log file should be split into smaller files on a daily basis. It this feature is enabled, the main log file will be renamed every day. The file name of old log will be based on the value of
logFile and the file’s date in the form
maxSize control the maximum number as well as the maximum total size of all log files. Log files beyond these limits will automatically be deleted.
You can also enable separate process chain logs that will record the output of process chain executables. Steep creates a new log file per executed process chain in the given
path. Since this can lead to a high number of files, you can specify a
groupByPrefix. If this prefix has a value greater than
0, the process chain log files will be grouped by prefix in subdirectories under the configured
path. For example, if
3, Steep will create a separate subdirectory for all process chains whose ID start with the same three characters. The name of this subdirectory will be these three characters. The process chains
apomaokjbk3dmqovemsq will be put into a subdirectory called
apo, and the process chain
ao344a53oyoqwhdelmna will be put into
ao3. Note that in practice,
3 is a reasonable value, which will create a new directory about every day. A value of
0 disables grouping. The default value is
Process chain logs can also be accessed through the new HTTP endpoint
If process chain logs are enabled, you will be able to follow the output of executables in Steep’s web UI. Open the UI in your browser and navigate to the process chain you want to monitor. You will see a new “Log” section like in the image below. The log automatically updates while the process chain is being executed, so you can follow updates in real time.
It is now possible to attach additional volumes to VMs created by Steep’s cloud manager. Also, the cloud manager can now create multiple VMs in parallel if you specify the
maxCreateConcurrent parameter in the VM’s setup. Note that the cloud manager will only create as many VMs as necessary for the workflows currently being executed, which helps save resources.
Here’s a list of other noteworthy improvements and bug fixes: