Wikimedia Release Engineering Team/SpiderPig/Meeting notes/2024-09-05

2024-09-05

edit
  • Attendees: Ahmon Dancy, Tyler Cipriani

TODOs from last time

edit
  • Jeena can start working on the refactor
  • Jaime/Jeena to sync on steps
  • Tyler to check in with design research for design support
  • Ahmon to start working on user flow through web-ui

API Thinking

edit
  • GET /index
    • 200 OK
  • GET /api/job {'jobs': [{id: 1233, status: in progress}]} <- returns last job + any jobs in progress

Case: in progress

  • <a href="job/1233">1233</a>
  • no post field

Case: no running jobs

Case: in progress

  • <a href="job/1233">1233</a>
  • <form action='api/job'><input></input><input type=submit />
  • POST /api/job { 'command': 'backport', 'parameters': { 'changeIds': ['XXXXX'] }
    • 201 Location: /job/1234 {'id': 1234}
  • Loop:
  • GET /api/logs/1234 <-- later socketio?
    • 200 ['asdfasdf']
  • GET /api/jobs/1234/interactions
    • 200 {'interaction': {'id': 1, 'prompt': 'continue?', 'responses': [{'yes': 'y', 'no': 'n', 'abort': 'a'}]
  • GET /api/jobs/1234/status
    • 200 {'status': 'in progress'}
    • 200 {'status': 'complete', 'success': true, 'errors': [] }
  • POST api/job/1234/interactions/1 {'response': 'y'}

Design features

edit
  • All users see the same view of a backport deployment.
    • All authorized users can respond to interactions for a job started by anyone.


Development phases

edit

Phase: Like sshing in and running scap backport cli, but with more steps

You ssh -L 8888:localhost:8888 deployment.eqiad.wmnet scap spiderpig --port 8888, then visit http://localhost:8888 in your browser.

Overview page:

You are greeted with a page showing a cool spiderpig logo and the current spiderpig status (e.g., idle, or running a job).

Status information is updated periodically by polling (API)

If spiderpig is idle, the page has an input box where you can supply change numbers for `scap backport`. When there is valid input in the box, a "Start" button is enabled. When the button is pressed, the client makes an API call to ask spiderpig to start `scap backport` with the list of change numbers. The backport operation runs in the background, recording its output in a log file.

If spiderpig is not idle (such as if the user recently entered change numbers and pressed the "Start" button), the input box and start button are hidden. If a backport is in progress, a clickable link to the job log is available. Clicking the link brings you to the log viewer for the job.

Log viewer:

The log viewer periodically refreshes the job status and log content by polling the API. If the job is running, a cancel button is available (API). The viewer periodically polls to see if there is a pending scap backport interaction (API) associated with the job. If there is, it displays the prompt along with buttons for making a response. When a response button is pressed, the viewer posts the response to spiderpig (API) and removes the prompt/buttons. All polling stops once the log viewer sees that the job has stopped.

Technical components:

  • Cool SpiderPig logo [spiderpig.png in repo]
  • Use Flask's (or whatever's) built-in webserver
  • sqlite database to hold objects. Use SQLAlchemy for ORM.
  • Job
    • fields: id, timestamps, owner, description, status
    • API: create, get status, cancel, list
    • Other: Periodic GC of old jobs and associated logs
  • Job Runner
    • Runs scap backport and writes its output to a log file.
    • When an interaction is needed, create an Interaction object, wait
  for it to be responded to, and deliver the response to the job's
  process.
  • Logs
    • Logs are line-oriented.
    • Logs are identified by the associated job id.
    • API: get size (# of lines), get a range of lines
    • Future: Eventually will need to support masking (which would become the
  default mode of retrieval, and the client can say if it wants to
  use the unmasked log)
  • Interaction
    • fields: id, job id, prompt
    • API: create (used interally by job runner), get, respond
    • Responding to an interaction sends a notification to the job runner
  and deletes the interaction object.  This must be atomic.  Two
  simultaneous attempts to respond to an interaction must result in
  only one notification to the job runner.


Next Phase: Like the prior phase, but more responsive, less polling.