Steep 5.8.0

I’ve just re­leased the new ver­sion 5.8.0 of my sci­entific work­flow man­age­ment sys­tem Steep. One of the high­lights of this ver­sion is in­creased fault tol­er­ance: Steep sched­ulers are now able to re­sume the work of other crashed in­stances when they de­tect that a pro­cess chain is run­ning but not mon­itored any­more. Steep 5.8.0 also in­cludes many other new fea­tures (see de­tails be­low). The ver­sion has been thor­oughly tested in prac­tise over the last couple of months.

Steep is a sci­entific work­flow man­age­ment sys­tem that can ex­ecute data-driven work­flows in the Cloud. It is very well suited to har­ness the pos­sib­il­it­ies of dis­trib­uted com­put­ing in or­der to par­al­lel­ise work and to speed up your data pro­cessing work­flows no mat­ter how com­plex they are and re­gard­less of how much data you need to pro­cess. Steep is an open-source soft­ware de­veloped at Fraunhofer IGD. You can down­load the bin­ar­ies and the source code of Steep from its Git­Hub re­pos­it­ory.

Resume monitoring of running process chains

Steep’s sched­uler as­signs pro­cess chains to agents for ex­e­cu­tion. It then mon­it­ors the ex­e­cu­tion and fi­nally writes the pro­cess chain res­ults to the data­base.

In the past, in case a sched­uler in­stance crashed, the res­ults of run­ning pro­cess chains were not col­lec­ted. In­stead, an­other Steep in­stance had to re­sume the work­flow and ex­ecute the pro­cess chains that were still run­ning while the old sched­uler crashed again from the be­gin­ning.

In ver­sion 5.8.0, a new sched­uler in­stance can de­tect orphaned pro­cess chains and re­sume mon­it­or­ing. Orphaned pro­cess chains are those that are cur­rently be­ing ex­ecuted by an agent but not be­ing mon­itored by any sched­uler in­stance in the cluster.

This fea­ture in­creases fault tol­er­ance and al­lows for a work­flow ex­e­cu­tion without in­ter­rup­tions and without the ne­ces­sity to un­ne­ces­sar­ily re­peat pro­cess chains.

The sched­uler looks for orphaned pro­cess chains at star­tup and in a set in­ter­val. Please have a look at Steep’s doc­u­ment­a­tion for in­form­a­tion on how to con­fig­ure this new fea­ture.

New features for plugins

The in­ter­face for pro­gress es­tim­ator plu­gins in the new ver­sion now sup­ports mul­tiple ser­vice IDs. This means you can use the same plug­in to de­term­ine the pro­gress of more than one ser­vice:

- name: myGenericProgressEstimator
  type: progressEstimator
  scriptFile: conf/plugins/myGenericProgressEstimator.kt
  supportedServiceIds:
    - myService
    - anotherService

In ad­di­tion, it is now pos­sible to de­clare de­pend­en­cies between plu­gins, for ex­ample, if a plug­in should be ex­ecuted after an­other one. This ap­plies to pro­cess chain ad­apters and ini­tial­izers:

- name: myProcessChainAdapter
  type: processChainAdapter
  scriptFile: conf/plugins/myProcessChainAdapter.kt
  dependsOn:
    - anotherProcessChainAdapter

Fi­nally, plug­in script files can now be pre-com­piled to speed up Steep’s star­tup time. Please read the doc­u­ment­a­tion for more in­form­a­tion.

Improved configuration

Con­fig­ur­a­tion files for setups (setups.yaml) and ser­vice metadata (services/services.yaml) of­ten con­tain re­peated in­form­a­tion (e.g. two setups might of­fer the same cap­ab­il­it­ies). With the new ver­sion, you can now use YAML an­chors to sim­plify your con­fig­ur­a­tion files.

In ad­di­tion, the con­fig­ur­a­tion prop­er­ties steep.services and steep.plugins now sup­port glob pat­terns (e.g. **/*.yaml). This al­lows you to, for ex­ample, re­curs­ively in­clude other con­fig­ur­a­tion files from a dir­ect­ory tree without hav­ing to spe­cify each of them in­di­vidu­ally.

Pro­vi­sion­ing scripts now sup­port a new upload func­tion that can be used to up­load one or more files to a vir­tual ma­chine. Read the new sec­tion on pro­vi­sion­ing scripts in Steep’s doc­u­ment­a­tion for more in­form­a­tion.

Other new features

Here’s a list of other note­worthy im­prove­ments and bug fixes:

  • Dis­play al­loc­ated pro­cess chain ID in agent de­tails
  • Re­duce the num­ber of log mes­sages, in par­tic­u­lar when agents join or leave the cluster
  • Run Post­gr­eSQL data­base mi­gra­tion only once per Steep in­stance
  • Im­prove grace­ful shut­down
  • Add pos­sib­il­ity to re­store cluster mem­bers on star­tup from VM re­gistry

Maintenance

  • Run unit tests in par­al­lel
  • Speed up Mon­goDB unit tests
  • Up­date web UI de­pend­en­cies

Bug fixes

  • Al­low runtime plu­gins to ac­cess cur­rent Vert.x con­text
  • Do not fail to de­lete Open­Stack block device if it does not ex­ist

Posted by Michel Krämer
on May, 19th 2021.