Wikimedia Release Engineering Team/SSD Sync Up/2019-03-26

2019-03-26 edit

Last Time: 2019-03-19

Discussion edit

  • Dan: Some progress on .pipeline/config.yaml. Need feedback on execution graph.
  • Mukunda: Look at DOT format, digraph. Would be useful if DOT could be generated from YAML format.
  • Dan: Looks similar to Form 3 in the examples. Might be suitable.
  • Lars: Needs to be understandable first and foremost. Efficiency doesn't matter much if it can't be understood.
  • Jeena: Progress on developer tooling - basic automated MediaWiki install should merge today. Restbase and parsoid can be automatically enabled. The changes also allow users to use an external service instead of running it all in minikube.
  • Brennen: Moving on to MediaWiki docker images - probably a base image with extensions dependent on local developers' source trees to start?

Execution graph examples edit

Relevant to discussion and T210267.

Three different representations of:

    a   f
     \ /
      b
     / \
    c   g
    |   |
    d   |
     \ /
      e

Form 1 – execution graph expressed vertically as a series of parallelized sets edit

execution:  # execution order is expressed top-to-bottom
  - [a, f]  # members of set can run concurrently
  - b       # each set is run in serial
  - [c, g]
  - d
  - e

This does not fully represent a directed graph as there's no dependency chains and is inefficient where diverging arcs have incongruent workloads. (In the given example, D would have to wait for G to finish before executing.)

Form 2 – a true execution DAG expressed as nodes and their subsequent siblings edit

execution:   # execution order is expressed as each node and its siblings
  a: [b]     # b follows a
  f: [b]     # b also follows f
  b: [c, g]  # c and g follow b as separate arcs, etc.
  c: [d]
  g: [e]
  d: [e]

While this can fully express a DAG it is hard to reason about.

Form 3 – a true execution DAG expressed as horizontal arcs that intersect (join) on common members edit

execution:           # execution order is expressed horizontally as separate arcs
  - [a, b, c, d, e]  # first arc
  - [f, b, g, e]     # second arc (intersecting the previous at B and E where execution would join/wait)

This can fully express a DAG (condensed to Form 2 internally by reducing consecutive pairs to a hash/map) and parallel execution is more efficient but perhaps it's still harder to reason about that Form 1.