Architecture meetings/WMF Engineering All-Hands 2013

This is an agenda for the 90 minute "Architecture" session planned at the WMF Tech Days 2013

Agenda

15 minutes - Background: Basic explanation and clarifying discussion of the process
15 minutes - Role of our architects, senior developers, WMF in the process
15 minutes - Running through a couple example RFCs (no decisions, but highlighting the discussion; see below if you want to volunteer)
15 minutes - Brainstorming on RFCs that don’t exist yet, but should
10 minutes - How do we move things faster (e.g. open invite weekly meetings?)
10 minutes - Architecture summit in January
10 minutes - Summarize next steps; assign action items

Notes

Etherpad: https://etherpad.wikimedia.org/p/ArchitectureSF2013

2:45-3:00 (15 minutes) - Background: Basic explanation and clarifying discussion of the process
(robla:)
Historically MediaWiki development has proceeded by responding to crises and immediate needs.
Foundation is now larger; engineering capacity has swelled. It's time to figure out orientation & direction.

wwwwwwwwwwwwwwwqwwwwwwqwwkwwwqwwwwwwwwqmwwwqwwwwwwwwwwwwwwwwwwwwWhere are we going?
How are we going to get there?
What languages and platforms, etc are we going to use?

Started discussions at Amsterdam Hackathon to get from ad-hoc to initial organization.
Further conversation in Hong Kong about some of the harder issues that came up in Amsterdam.

Architecture Guidelines: https://www.mediawiki.org/wiki/Architecture_guidelines
There's a lot that should be in this document but isn't yet.

Agreed on some first steps: start work on RFCs / prototypes for title object, storage API

RFC (https://www.mediawiki.org/wiki/Requests_for_comment) proccess:

posting design docs on Mediawiki and getting feedback (the feedback is the important part!)
Tim, Brion and others will help you do the right thing

Need to formalize process a bit:

How are we going to run the architecture summit?
How are we going to manage the parts of the process that happen across mailing lists and wiki pages?

also prep for January architecture summit - 2 days

We have different sorts of RFCs:

RFCs from staff; work-in-progress; implementation usually already underway. Announcement-like.
RFCs from individual contributors that are able to implement their ideas but seek validation for the work before undertaking it. -- "Is this a good idea?"
"Idea" RFCs -- no concrete implementation plan.

How To: (todo: document this on some page linked from https://www.mediawiki.org/wiki/Requests_for_comment )
1) create subpage of https://www.mediawiki.org/wiki/Requests_for_comment
2) add to table on https://www.mediawiki.org/wiki/Requests_for_comment
3) announce on wikitech-l
4) … discuss
5) look for decision from Tim, Brion, Mark Bergsma

3:00-3:15 (15 minutes) - Role of our architects, senior developers, WMF in the process

Very senior developers (people with 'architect' in their title) have been tasked to move the process forward, but we need to think about how this should move forward.

Tim says he can't decide everything.
Brion suggests looking at processes from other communities for best practices/working processes.
- Python pep proposals - http://www.python.org/dev/peps/
- IETF RFC and working group organization http://www.ietf.org/wg/
- PHP community (not PSR)
- Lulu architecture review board (Nik)
- Apache voting/consensus model - https://blogs.apache.org/comdev/entry/how_apache_projects_use_consensus
Create list of area "experts"
Should the focus of RFCs be to remove blocks? (not just a go/no go binary decision)
- what are the "next steps" that are outputs of (each step of) the process?
- should be about describing a problem, proposing a design that solves said problem and then picking an implementation that get deployed. ~ Antoine
One existing reason for RFC is that the dev realizes this is too big for a code review. Is that good?
Three kinds:
- need to remove block (-1 war or fear of rejection in CR). Smaller project, info pretty complete.
- bigger RFCs: need discovery/feedback to move forward (reduce risk). Prototype / iterative development needed.
- Huge things! (eg ResourceLoader) - Iterative development. May have to "make one (or more) to throw away"
  - some things are more speculative like 'use html for storage', maybe fits in this category maybe doesn't. may also have dependencies

Nik describes arcitecture review board from prior employer:

Nik picked people and created board
became a list of 5-6 people who knew all the things that were in progress and who the local domain experts were
good environment for developers to come to for help/guidance
tried to keep it as small as possible (but no smaller)
goal was to be able to convene at reasonable cost and short notice to hash out controversial decisions.

Identify affected parties and gather input

Examples: Wikia, Pywikibot, community (gadget/bot/tool authors), WikiHow, VistaPrint, ... (MediaWiki foundation?)

Don't wait for pull; do some push to reach out

Ori would like a process that allows experimentation and focused development without expanding scope to the whole wiki world. But there should be a path for isolation and rollback.
^ definitely some overlap with idea of iterative development here

What is the role of the "!vote" (not-vote)?
what is a "!vote"?
A "!vote" is a discussion where people are supposed to be indicating support or opposition *with reasoning*, and the weight of the reasoning is supposed to carry the day rather than the raw vote counts.
Is this just "discussion"? To an extent. It's also, despite the name, a vote.
what counts as consensus? "I'll know it when I see it" ;) Usually someone "trusted" reads over the whole thing, summarizes it, and announces what the consensus is. Or not.

3:15-3:30 (15 minutes) - Running through a couple example RFCs (no decisions, but highlighting the discussion; see below if you want to volunteer)

API roadmap
https://www.mediawiki.org/wiki/Requests_for_comment/API_roadmap

Read RFC for lots of ideas it proposes to improve API in general
Who needs to be involved in moving the API RFC to resolution?
- Shortlist: Yuri, Tim, Brad, Timo, Gabriel, Brion, Subbu, Max, Jon R
  - That list is already getting not very short - time remove ppl! :)
- ROAN IS SORRY
- But: front-end folks need to be better represented, since they'll be the folks writing code that uses the API
- Need some spike implementations to make informed decisions
  - Parsoid HTML API in REST style for caching, working on this right now -- gwicke
  - (Meta comment) Is this a typical output of the process? "Need some users, need to migrate some packages, need implementation experience, etc"
    - For bigger things like this, definitely. For small things I think we'll get through that more quickly, many things will go quickly to code review and just need a go/no go. But not API rewrite. :D
    - Note I'm not proposing that *all* users be patched before the RfC is approved. Just "enough" to learn what there is to learn. Sometimes "enough" is none at all. ;) [cscott]
      - +1

Standardized thumbnail sizes
https://www.mediawiki.org/wiki/Requests_for_comment/Standardized_thumbnails_sizes

Read RFC for details
Was created based on IRC discussion, but really hasn't had strong champion to push it along
Tim would like to see sample images of the effect of client-side scaling rather than current practice
- we have to balance image quality & user experience against the caching benefits...
An Ops problem is that a nearly unlimited number of thumbnails can be created and they must eventually be purged; also cache pollution
Who should be involved in finding the solution to this?
- Brion [as a proponent], Tim, [misc platform people?: Aaron], Gabriel (Parsoid)
Gabriel: VisualEditor could nudge editors toward using a limited set of sizes. This could go a long way toward solving the problem without imposing any hard limitation on what editors can do.
- Remember though there are non-VE editors too.
Is there another solution that reduces cost but retains flexibiity and benefits of current system?
Downscaling may be better than upscaling from a visual representation perspective; this will cost more server/network resources
Quick clarification: we could serve the closest match *client side*. Meaning we only display 100px image, without scaling, when people ask for 102px sizes. With the right analytics to figure out what the most useful sizes are, this should not be too disruptive to editing. My greater point with this is to involve some analytical tools into the RFC process.
Or, we could just buy more hardware and move on with our lives (and maybe improve eviction)
- try purging a multi-page PDF with 10 thumbs per page and see how many 500s you'll get before you succeed in doing so.
- Unlike popular believe, more hardware doesn't always solve everything ;)
  - Not unlike unpopular unbelieve, ... sometimes it does.
  - ...not in this case though :)
- [there's also a DoS vector here, in that scaling large images is expensive. forcing generation of lots of sizes uses CPU time and disk space during the scaling as well as space/bandwidth after.] (that can defnitely be mitigated by other means? like banning abusers -- thanks for volunteering to write the code to detect that :) :) ) or, thanks for getting up at night to do that How do you ban a DDoS of 1000+ machines ? Rate limits? per ? general image generation, basically bound it (still DoS, but maybe not taking down everything). Bounding it was the reason for this proposal, although it may need to be done in another way than client side scaling. I meant server-side bounding as in 'we will only create 5x the normal rate of thumbs per time X'.Won't you be denying service then to legitimate users during this process ? Sure, that's why it is still a DoS / no perfect solution. Maybe thumbs for pages with a lot of new thumbs could get a lower prio though.

DROPPED DUE TO TIME
3:30 -3:45 (15 minutes) - Brainstorming on RFCs that don’t exist yet, but should

HTML templating (in JS and PHP? HTML-like. mustache/handlebars-like)
Overhaul of logging infrastructure to emit machine-readable data via a configurable transport.

3:45-3:55 (10 minutes) - How do we move things faster (e.g. open invite weekly meetings?)

3:55-4:05 (10 minutes) - Architecture summit in January

Two-day dedicated event with a combination of community (WMF, Wikia, volunteers ...) represented

Discuss specific RFCs
Talk about architecture guidelines
Articulate overarching description of the future of MediaWiki

How to put together a list of folks to invite?

Participants should be sufficiently diverse to allow for substantial progress to be made on as many RFCs as possible.
So: have an application process that asks would-be attendees which discussions they'd like to be involved in.

Possible plan:

Meet in SF
Use SPUR and WMFHQ

Prior to summit, we need to have some discussions on process
[Maybe we just have to start some experimental processes and see if they work.]
Backlog.... don't let it get too big. Some things end up as RFCs because they're blocked; one way of clearing the backlog is to unblock people.

Rob's experience with IETF and W3C

IETF is very inclusive. Decisions only made on mailing list. Things move slowly, but the process is egalitarian.
- You need to be quite determined / persistent to push things forward.
W3C willing to trade inclusivity for speed
- Have infrastructure (people) to have weekly calls to discuss and keep things moving forward

Front-end has a process that is emerging for their RFCs

Tim proposes organizing RFCs by area of responsibility +1

Trevor reminds us to have fun +2

Arthur proposes that architects be the designated wranglers.

People could also sign up as interested parties on the RfC
I feel it's important that this doesn't turn into "everybody's" responsibility -- ie, nobody does it. [cscott]
- If nothing else gets decided, it's gonna be Brion and Tim and Mark. :) (the triumvirate!)
  - Can we make *one* of you responsible for any given month? ie, Brion does 0-based months mod 3 == 0 (aka FizzBuzz RFC review)
    - rotation sounds good as it can help avoid burnout, yes
- It could be the responsibility of the proposers as well to solicit participants
  - i'm not worried about the participants, i'm concerned about the 'deciders' (who decides what needs to happen next, not if the RfC is good/bad)
    - I'd guess that most RfCs get no final decisions before they are done
      - most RfCs now are getting no decisions at all. :(

RFCs under discussion at summit should have champions present.

let's make sure we don't go crazy and do ALL THE RFCs though... keep it doable
Add a field to the RfC template for 'champion'? PEP process says every RfC should have exactly one champion.

RobLa votes for defaulting to inclusive rather than exclusive for first attempt at summit

Brion:

general idea of RFCs to cover
method to contact champion for each (Internets?)
- let's build a list on one of them wikis
subject area experts
- RL, VE, API etc deep knowledge needed for some things
- design, don't forget it!™
- product?

Tim:

Sean Pringle (DBA)
Architects
Front-end folks: Timo, Roan, Trevor

Service-level agreement for RfCs? :D

C. Scott proposes RfC for RfCs, similar to http://www.python.org/dev/peps/pep-0001/

LTS -- do we need to do similar planning on long-term releases as well? Or keep that separate...

robla prefers to keep them separate. Logistics of meetings are hard enough as it is.

4:05-4:15 (10 minutes) - Summarize next steps; assign action items

We have a good sense of what we want the next architecture summit to look like.
We're going to defer some of the details re: process and decide them then.

<notes>

</notes>

<comments>
API RFC may also need benefit from some bot author representation

So announce it to them specifically;)

</comments>