Raw notes from our Offsite in Vienna.

CI Interview Discussion

Culture/People

QA Tribe

More of this?
- Naming things
- Folks are confused by it/"tribe"
- Not a lot of discussions of next-steps right now
- Possibly need more time
- Current Purpose -- give people a sense of community
- people doing QA not on our team are isolated sometimes
- shouldn't be just QA people (currently it's not, that should continue)

Tech Talks

RelEng could drive this
- Good for newbies
- Has slowed down recently
- Pull in external speakers
- Could draw in external folks/community
  - hashar does a lot of talks (greg-g: you should record those)
- related to tech blogging
- Stuff on wikis is different
  - different writing style
  - different sense of authority
- CREDIT showcase
  - small investment
  - good overview of topic where we dig deeper later

Code health group

To make it easier to develop good code together
- Possibly biggest/most important sub-point here
- Trend, other orgs have it
- formalizes things we want to make better
- "as an org" vs "heroics"
- There is an informal cabal that does these kinds of things
  - Nobody is aware
  - No communication
  - is organic
  - may be good to formalize
- May not be an issue now, but may become more of an issue
- Could have their own backlog of things
- Could sell to other teams to dedicate some resources to this
- RelEng should do X vs. The org should move to X
- There are folks who do this
  - addshore
  - legoktm
  - others
- There are limitations to informality
  - No roadmap
- Victoria could be a stakeholder
  - formalizes a direction
- A microcosm of organizational problems
  - no team in charge, just cabals doing organic things

Open questions

How do QA tribe and Code health group mesh?

- If Code Health group becomes more formal, then QA Tribe does less
- A centralized code test group vs an operations-like QA team
  - embedded testers
  - success and failure in both variants
  - lack of authority for embedded folks
What does a code health group do?

- What could we be doing/what should we be doing?
- What makes sense organizationally?
- Specifically
  - Code coverage
  - Dashboards
  - Vision statement
- Do we need folks from all teams on the code health group?
  - Need investment from other teams
  - Development stakeholders from many teams
  - How much development time will other teams put into code health priorities?
- Sponsorship
  - Victoria as CTO is sponsor
- best practices and assessments
- similar to tpg
  - promotes scrum to the remainder of the organization
  - start small with folks who want to do this
- Making tools that give you an overview of all repositories
- How to develop software in a healthy way

Process!

estimation

self-inflicted schedule issues
greatly under-estimate time to do things
general problem with quarterly planning, etc
needs to bubble-up to org-planning-level
- there is currently a discussion that greg will bring up
Isolated discussion and process + dialog
eventualisim can create estimation problems
green check marks on the quarterly goals are perverse incentives
- misaligned with how we work

bug mgmt

features vs bugs are not differentiated in phab (or bugzilla)
need for identifying "rework" (i.e. bugs)
upstream changes in phab will make this easier
should do retroactively?
- Andre is kinda doing
- paladox? GSoC? Not hard, just needs doing?
Would like a baseline for "rework"
"rework"
- Post-release rework vs pre-release problems
- Tech debt
  - conscious decision vs unanticipated problems
2 kinds of bugs
- pre-release/features/etc
- bugs that escape to the wider world

test metrics

collecting and reporting
pass/fail and track progress for tests
visibility (i.e. testing dashboard)
Keeping data for a longer period of time
Currently keeping a varying amount of data kept for varying amount of time
The amount of tests and quality varies a lot per-extension
- some extensions have bitrot
- maybe run the tests periodically
- this is a low priority since it's not in production
Are tests that have never failed bad tests?
Collecting the right metrics and making sure they're actionable metrics

test strategy

common base for testing, even vocabulary
i.e. some unit tests are not really unit tests

Tooling

Browser-based

Had some success
Should maybe be expanded

test infra

it's an improvement
environmental parity
SSD is work in this area
beta cluster is an afterthought, sometimes
- this depends on the team
test data parity
- we don't need all the data
- depending on the kind of testing, could use production stuffs
- dumps of smaller wikis and overwrite data occasionally
- putting thought into which wikis are supported in beta
test setup takes a long time
how much of a production stack do we want to provide?
Being able to reproduce what CI does locally

Topics

How to formalize Code Health group

votes 6
formalize it
make it public

What do we want from this?

~~Reason for existence~~
1 page overview for Victoria
~~Success factors for this group~~

quotes

quotes from Code Health: Google's Internal Code Quality Efforts

Any aspect of how software was written that can increase the readability, stability, maintainability and simplicity

Improve the lives of engineers

Success Factors

~~Not "heroic" vs funded(?) with feelings~~
sponsorship
- cannot be an unfunded mandate
- senior leadership behind it
- encourage participation
- "official"
- diffuse bad behavior
Guinea pig
- mobileweb -- eager and care about quality/willing to test
hashar: We can't really determine success by ourselves in a vacuum
hashar: RelEng's criteria is to have a team with success criteria
greg: can't sell that :)
hashar: let's build it first, then sell it
jr: what's the cross-organizational benefit and value for this, what's the success criteria if it works for a team
start small vs start big -- top-down vs bottom-up

Long-term vision (selling)

Victoria cares about tech-debt
removing tech debt vs preventing tech debt
disseminates knowledge through organization
able to attack more complex problems since dedicated group (vs. individual contributors)

Outcomes

Healthy code
Metrics
- cyclomatic complexity
- code coverage
Support/mandate
"renewed focus on QA" outcomes
- Code health group
- It's a good idea to start small, i.e. test with mobile web
Management wants long-term vision
Quality vs Quality Assurance
- code health group can assure big Q quality
Make developers more quality oriented
testing does not provide value, lack of bugs provides value
Enable developers to do the best work they can
test engineer enables the team to test
high-level criteria, i.e. success factors
couple people on releng + j.robson
this is comparable to efforts by the security and performance team
reduces duplication of effort
code health group documents best practices
prescriptive
IDEA: adding code-health outcomes to roadmap
- extension quality assessment

What does this look like?

weekly meeting
core permanent group that is a steering committee
steering committee needs to be cross-org
J.R. leads steering committee + liaison from RelEng
don't call it qa -- "code health"
rotating cast of supporting characters in addition to core
- interested in specific areas
- or has past experience
Determine scope within group
defining a roadmap for code health

VISION

Make things less complex so developers can quickly add value to our product.

What the Code Health group does is work on efforts that universally improve the lives of engineers and their ability to write products with shorter iteration time, decreased development effort, greater stability, and improved performance.

example: by decreasing code complexity we increase code success whatever

TODO Add word "value" here

Triage of projects

votes: 3
Project workboards should be use for categorization
RelEng team workboard used for overall view

Projects

Some projects benefit from triage

scap
ci infra
browser tests

Reorganization of projects in phab

<2017-05-15 Mon>

Dashboard Testing Health

votes: 5
likely to fail, but worth trying! :)
CI artifacts details
No easy way to find out historical pass/fail rates
Does not answer if you can merge:
- What information is available for planning
- Historical information
Need a shared understanding of terms: e2e, smoke tests, etc.
information is more interesting for functional tests historically
mobile team gates with browser tests, other teams have browser tests passing at the end of the sprint
Running browser tests against testwiki after branch cut would be neat
There is very little visibility into a lot of manual testing activity
No way to alert the org about big features that are rolling out as part of a deployment
Attempt to quantify "quality"/"health" of our code
Could be used to make deployment decisions
Basic information: SLOC churn, complexity, coverage, logspam is useful for release-like decision making
We currently find our problems in production
Used to have a Friday deployment meeting to make deployment decisions
Where can people raise a flag about the train? ¯\_(ツ)_/¯
- deployment blockers task
BigRedButton.wikimedia.org
let's think about the dashboard as a way to help teams plan their work and think about code health

Users/Uses

Identify real issues behind breakage
- facebook and the one-person project
Phabricator has a tag for each branch for hotfixes
- could feed into the dashboard
What is the data that is useful for planning activity

Who/What/How/How much

WHAT: SLOC churn, complexity, coverage, logspam, code breakdown
- Ask code health group
- extract code-review information from gerrit
- phabricator bug-count could be useful
- gerrit: number of changes, review backlogs, etc.
- No coverage for extensions
- Core is generated once a day
- Code complexity for PHP stuff -- nothing currently
  - code climate
  - Sonar cube
Puppet out of scope
MediaWiki/config is a fucking mess.
Code Health Group as shareholder
Analytics team has knowledge and tooling to store random metrics
tooling
- junit for unittest output
- clover for code coverage
- composer and npm for linting tools
  - phpcs
  - phplint
  - https://www.mediawiki.org/wiki/User:Legoktm/ci

Little Steps

votes: 4
Background:
- Nodepool caused openstack pressure and the cloud team was mad
- Meanwhile: CI was super slow and other people were mad

Annual Planning next steps

votes: 3

SSD / Pipeline Planning

Goal: be able to run (docker) containers in production, and use those same containers in development (and potentially CI).

Container technology, not tied to docker directly. Part of why Dan built Blubber. Right now docker is most stable, but maybe move to rkt in the medium term future for reasons (eg: has an init system whereas docker doesn't)
In the interirm we're doing docker.
Recommended tech is blubber, if you use blubber we'll help you migrate. If not, you're on your own.
Blubber: what you run locally to build you test image. Currently Services use service-runner to do similar things. Blubber compiles down into a binary (it's Go). Also yay types in Go.
(troll on tcl)
Docker/k8s is also in Go.
Blubber input/config files is yaml. Dumps out the dockerfile.
yaml -> Blubber -> dockerfile -> Jenkins build the docker image
Point: reproducible environments.
CI is out of scope for now. If we want to be a part of the pipeline at all, this is the compromise.
In Lyon, Joe and Yuvi did a here's what k8s is and mesos and etc, we really think will save our resource allocation.
Services/gwicke: in grand SOA future third-parties wouldn't be able to install "MW". Docker is the solution for third-parties.
Pragmatism makes everyone attend the Monday meeting :)
ci-staging jenkins can build from the blubber yaml and push into the ci-staging docker registry
Ops is supposed to give us access to the swift backed registry.
Ops is getting hardware for the staging k8s cluster. Should be getting it soon/now-ish. Just be services for a bit.
End of this quarter (Q1): build mathoid in Jenkins and push to the new staging cluster.
How big are images? depends. Ops is working on building base images. About 300megs for base image. Then you have a node-based image which is base image + node packages. Then you use that for eg: mathoid. kinda like service-node (the base puppet for all node services)
docker is as shitty of a solution as all the other solutions

What do we need from Ops to complete the CI work?

CI cluster from Ops
We maintain k8s for CI
We jetison to Travis
We do a temporary container thingy
- we need buy-in and commitment to migrate to a shared

What's happening next year

Code Health group work
13 remaining trebuchet repos
SSD
Little Steps + Cloud team talk

Trying to figure CI in SSD

replace nodepool
replace most of CI
integrate everything in the base image
migrating CI is us + cloud that's interested

Other thoughts

Scap3 work will be discarded, this is frustrating

CI work

php55 on jessie with ops support
nodepool remains on jessie and stretch
container to run images

Part of SSD

ops provide a k8s cluster for CI
using production registry for CI

Move to travis?

takes time

The Plan.

problem

: we're on nodepool and no one is happy

solutions:

CI K8s Cluster from ops
~~we maintain a k8s cluster (NOOOOOOOOOOOOOOOOOO!)~~
we move to travis

temporary contianer things

Why do I need k8s?
Do I need k8s?
registry or portion of
creates code ghettos
Can ops maintain a bunch of servers that have just docker installed?
if we have budget for servers can ops just use those?
what's the delta between what's already happening vs what we need?

Next steps

productionize blubber
define workflow pipeline in jenkins (pipeline is groovy)
Jenkins plugin for k8s
Jenkins master speak to k8s cluster
Jenkins master public internet
Jenkins artifacts capture

Annual Planning Outline

Quarterly breakdown

Now: productionize blubber to push images to staging
Q1:
- staging cluster e2e tests
- other services move to staging?
- Jenkins master to speak to k8s cluster (changes to jenkins master? Ops may need changes for Reasons™)
- Use minikube to PoC with Jenkins plugins
- use blubber to produce test images that can be used on CI-k8s
- assumption: ops is working on CI-k8s cluster
- outcome: have k8s requirements for cluster
Q2: ops CI-k8s, Jenkins master

Container/Blubber annual plan

Mathoid PoC work
- Now: productionize blubber to push images to staging
- local development (minikube)
- build "production" images in Jenkins for use in staging (based on mathoid work)
- staging k8s cluster (ops)
- end-to-end tests in staging -- MUST BEFORE PRODUCTION
- What does e2e test mean in this context?
- Is this a functional test?
- Keep in mind that webdriver is needed for real e2e testing
  - after a bunch of services have migrated:
    - put MW in front
    - Run webdriver
    - ???
    - Profit
- Push production-ready images to production
- Other services → production
CI Infra (depends on staging k8s cluster)

- CI-K8s cluster
- Use minikube to PoC with Jenkins plugins (while waiting for CI-K8s permananent cluster)
- Jenkins master to speak to k8s cluster (changes needed for jenkins master? Ops may need changes for Reasons™)
- use blubber to produce test images that can be used on CI-k8s
Migrate CI Jobs to containers + k8s

- depends on CI Infra
- There is a lot of work that need to happen here

Kill Trebuchet

Kill all the things (T129290)

Webdriver.io

CI for extensions (hackathon is when work starts)
- CI only in core now
Workshop (@ hackathon && online)
Pairing with folks
6 months from <2017-05-16 Tue> kill ruby MediaWiki stuffs
End of Q2 No Ruby, Node only
Projects needing assistance:
- CirrusSearch
- Wikibase
Announce deprecation
No Ruby support in new CI (see Container/Blubber annual plan)

Code Health Group

scope/vision
form a steering committee
roadmap
sell it:
- Success factors:
  - leadership sponsorship
  - steering committee formed
  - funded resource allocation
find a team to work with (mobileweb)

MediaWiki test runner standalone

nitpicker/shitbot/chipotte

💩 MediaWiki/core

MediaWiki decouple unit / component / integration tests

votes: 3
magical explosions

Problem 1

Unit tests have dependency chains
- we end up running a lot of tests that should all pass
- this is slow
- example: Math depends on visualeditor which depends on Cite, a change in Math will run tests for VisualEditor and Cite

Problem 2

When there are changes to extensions that are depended-upon by other extensions that breakage is not bubbled-up anywhere
example: Math depends on VisualEditor which has a change that breaks the Math extension, but that breakage goes unnoticed, untracked

Idea

pre-merge!
Ensure that an extension being merged has its tests run, but do not run the tests of that extension's dependencies
Ensure that all extensions that depend on the extension being merged have their integration tests run

# When JsonConfig has patch merged, we find its dependencies
for ext, dep in dependencies.iteritems():
    if 'JsonConfig' in dep:
        print ext

# Graph
# ZeroPortal
# ZeroBanner
# And run all their integration tests tagged for json config
# tests/extensions/*

MediaWiki/core

Need some mechanism by which we identify the extensions that depend on a particular piece of core
IDEA1: submitter determines which extensions need their integration suite's run
IDEA2: ShitBot

Parking Lot

DONE Next steps for Tech Talks

Up to 5 minutes

New stuff in scap

new stuff in CI
- Nodepool: what's this thing?
- Nodepool: how are we fixing it :)
webdriver.io browser test updates
demo mathoid PoC
How #RelEng uses Phab for work management (our workboards, sub-projects, dashboards, and milestones)

DONE How to Formalize code health group

formalize it

make it public

DONE Triage of projects

has fallen off
less frequent triage meetings maybe?

Post offsite TODOs

Done - greg: bubble up estimation discussion for org annual planning

Discussed. Point taken by both Victoria and the rest of tech-mgt. Seemed to be how other teams were experiencing the world as well.

(TODO) - JR: reach out to Google for help with Code Health meetings

this stuff may be Google-specific
delicious secret sauces
What did you learn trying to put this together
JR Having lunch with Google Code Health Lead on July 12th

(TODO) - JR: Code Health Dashboard

Ask code health group what belongs on the dashboard
Talk with analytics about this
Long term storage of jenkins artifacts (elastic search?)
Investigate options for code complexity and coverage

(TODO) - JR: Reach out to others for tech talks

Done - Greg: Phame blog for techblog content

Outcome of the tech-mgt F2F. Rough consensus to re-enable.

We (RelEng) should write up a "why we're turning it back on" as the first new post from the "Doing the Needful" blog. tl;dr: Tech-mgt wants a place to share technologist focused blog posts. The Wikimedia Blog is not it (it is the wrong audience and the process is too heavy weight). This meets our needs easily and is low cost (as in person time).

(TODO) - Greg (and Chad, Mukunda, Tyler): JFDI the changes to deploy branch cutting and train cycle

Discussed at team offsite. Let's finally just put the branch cut on a timer. And, at the same time, let's make any changes to the deploy cycle/cadence and use of Beta Cluster that would benefit from such a timer (eg: cutting on Thursday, deploying to a multiversion'd pre-staging.deployment-prep.wmflabs (or whatever) over Friday and the weekend for E2E and manual testing.

Wikimedia Release Engineering Team/Offsites/2017-05-Vienna