Architecture Summit 2014/Architectural value, guidelines, and process

Subpage for the discussion of what we value in our current codebase, how to reflect that in the architecture guidelines, and tweaks to our RFC process at the summit.

Overview

edit

This session expanded into two separate conversations about the big picture. It started as a Q&A focused on Tim, Brion, and Mark, but turned into a general conversation about architectural leadership and the process of RFC/architectural review.

Notes

edit

Below are notes taken at the Architecture Summit, originally taken in Etherpad: RFC_process_summit (some of which copied there from ArchitectureSummitRetrospective)


Morning session

edit
  • RobLa: What are we proud of?
  • Tim: Moving to OO from global functions

Chris Steipp: what do you want to see in 10 years?

  • Mark Bergsma: interested in the service architecture RFC, could bring interesting things
  • Tim:
  • Brion: more multimedia features... but these are plugs to our architecture, and therefore Core needs to be flexible and able to accommodate all these features. Keeping Core clean, scalable, and being able to go to the next level.

What do you think about the recent changes like Parsoid that not directly integrate with the software

  • Brion: we will definitely see more like that, going to service oriented software. Better security, because we can replace things like the math renderer and thumb generation. Like the idea of several little pieces that sit on their own and can be debugged on their own. Small hosts should run virtual hosts that can have all this software. But also see that there is the need for just setting up a mediawiki with a php mysql stack
  • Tim: We have to look for our small installations

MaxSem: Anything you regret and what can we learn from this?

  • Brion: CentralAuth. We should have designed it better in the first place.
  • Tim: Extension registration with require, because code is executed on every call
  • Mark: NFS

Chad: What piece of architecture do you want to work on

  • Tim: Configuration, there's still plenty of chunks we have to go. Improve handling of context and global state
  • Brion: Completely redoing on our talk and notification systems. What is the nature of putting sth on the wiki and publishing it vs. still working on it. Example: draft namespace. We need sth like that with multimedia files. Maybe we want a separate concept of "in draft" we can move across the system. Kill concept of titles and names being the primary keys of our entities. This also means it's not so easy to have a upload conflict. Need to rethink concept of namespace
  • Mark: Security over network and services. Has been neglected. When joined, everyone had access to everything. But only a few people that had access. It's getting better. Passed a security penetration test. Now with more people, security needs to be worked on

Sumana: RFC process compared to that of other projects: some of them had not written our architectural guidelines. Do we need this?

  • Mark: For operations, there are no architectural guidelines yet
  • Tim: The architecture document was not completed because there were some controversies. We should have guidelines that are uncontroversial. It's difficult in a diverse setting as ours
  • Chad: Should they be more descriptive instead of prescriptive?
  • Tim: yes (??)
  • Brion: Small set of guidelines: scales, it's secure, it's multilingual, it's pluggable and composable. When is it ok to update sth in production. That's not so much about architecture, though.

Architecture process as a way of managing change. MW is a living software -> we don't expect it to have a cristalline architecture. Does signal openness to changes. The risk is having parts of the stack falling into neglect. We should tolerate some degree of disagreement, if someone wants to own some part of the software. Human element

  • Brion agrees
  • Tim: change is not an aim in itself. There's a spectrum of change tolerance. Values backwards compatibility. MediaWiki will be ugly and not conforming to a specific ideal. That will be true, even if we'd be rewriting it from scratch
  • Mark: Areas of change that we are not interested in, do you find we block that process or just not interested?
  • ?: It might be good to identify habits that you do think are too conservative
  • Brion: A big problem is that there are areas that don't get enough attention, code review. Even if requested, it never quite happens. Needs to be fixed. Maybe there's a graph where you can see what areas are neglected. We are still having some kind of overload, though. You still need to be in a group to get yor changes easily through.

RobLa: Is our current process sufficient for the amount of change we are facing in the next couple of years Many people disagree, some neutral

  • Roan: We should expand the pools of architects and delegate authority more. We have more people around that fulfill architectural roles. Architect title vs. architect role. We are about to fix that, but there's still a lot of attention to the official architects here
  • Tim: The system was not decided by architects but by managers and they should comment on it
  • Brion: tends to agree, we should have a larger group.
  • RobLa: as one of the people who imposed this process: my only insistance is there is a process. You guys need to agree on this
  • Erik: I've been around a long time, but noone wants me as an architect (laughter): The architects do have a lot of merits. But agree with Roan as to there needs to be a discussion. we're introducing this level of principal engineers. That's different from the technical leadership alone. Adopt the process and add the possibility of having an appointed delegate decide on an RFC. But need a more formal position on technical decisions than just being involved.
  • Roan: There should be a not so implicit tie between role and title.
  • Erik: principal engineers would just do that
  • Trevor: How did we get so top-down? It used to be a flat process. We don't have any evidence that all this structure is actually improving our performance
  • Brion: We want a less narrow point of review. Want to make it easier not harder. Processes should make it easier. If they are making it harder, let us know
  • Trevor: Is there a deeper problem here?
  • RobLa: which part of the process seems superfluous to you
  • Trevor: anything that's not been proven to be successful
  • Nik: rather than having an RFC approved by the architects, could that not be approved by three people
  • Sumana: Suggested that the default is that RFCs pass. Other projects have a voting process, or a pet ??
  • Mark: is the current process preventing anyone from participating in the process?
  • Sumana: Part of it is expectations. If I don't have a clear view of expectations, it's harder to participate
  • Trevor: Think about the most innovative things you have done. Would the process have helped you?
  • Mark: A long time ago, be bold was the motto
  • Brion: We should definitely be bold. There is the danger of putting up some notes and letting it sit in the queue for some time. Things have gone into the queue and they don't get addressed for a long time. This meeting is helping this something. But we can't talk about anything. We have IRC meetings. Need to get more regularity in this process. We need to be actually be pushing, be bold.
  • Mark: We don't need to try to actually make everything in the RFC perfect. Maybe we should move forward with it and work out the details later. It's good that we have this process, but make it a liitle bit less formal.
  • Brion: There's a lot of things going on outside the RFC world, e.g. Flow.
  • Tim: Do file RFC but do not need to fulfill all the details of the process. How about features? Is the RFC process good for features. As for hierarchy: come from a consensus based process. this is still how it is in practice (?)
  • Mike Schwartz: I'd like to see to enable things that don't seem to happen. SOA, Templating, initiative did not come from the architects. We didn't have a statement of the problem. There should be a lot of people involved. But ultimately, there has to be a decision. And it might be unpopular. The architects should make this decision. There are some pain points discussed here that are not architecture related, like code review. These should be answered by a dedicated group of people, like the architects
  • hexmode: The RFC process does not appear to be used by the Foundation at all

Some disagree

  • hexmode: that's how it appears. So it suggests like "I have an idea, can I sell this to the architects?". Going through the RFC process is not the right way to get the resources without some code on the ground
  • Mark: It is used internally, but maybe there should be more
  • Tim: more than half of the RFCs are filed by Foundation staff members. Volunteers proposing things they have resources for (or not), staff proposing things they want resources for (or have)

Afternoon session

edit

bawolff talked about an Openstack utility to generate a codereview dashboard: http://status.openstack.org/reviews/

TheDJ:

  • Goal of RFC type processes is to set clear goals before doing the work and to create clarity about what is happening
  • If I drop out for 2 months i need to be able to pick up on status
  • have regular irc meetings, rotating 3 time zone slots; keep track of progress and show it

Roan:

  • During lunch break, talked with Trevor about anarchy. Roan had thoughts about use of RFC process. RFC process can provide predictability. For ResourceLoader, when they came up with it, they solicitied input. It wasn't immediately clear where to find this info. An RFC process provides centralization for where people that are vaguely interested to find out what's going on. RFCs work well for large, sweeping changes that affect lots of things. Service-oriented architecture, ResourceLoader, ...

Another category is something that can be developed in a corner, and slowly expanded, then convince other people to use it. Eventually convince everyone. Example: UI library. RFC process or similar can be used to promote standardization.

Thirdly, don't over-formalize things that don't need either of the prior two. If you're not making a huge, sweeping change, and if you're not standardizing (and need buy-in), then go and do it. We should try to optimize our processes for the kind of things that fit in them. Don't grow processes to try to fit everything.

Scope summits to large things.


Owen: First commit to MW last year., took a few months to be merged (https://gerrit.wikimedia.org/r/#/c/48435/ - Tim picked up patch and fixed it)

  • 1.25 million lines of Wikia code built on top of MediaWiki. Until aware of the RFC process, no way of approaching problem. Half-dozen RFCs from Wikia. Encouraged by fact that this is happening at all. Before this, no idea about what to do about the code. This process, summit, opens doors for working more closely with the community. Additionally, I think over the next year or two or five, would help to have high-level architectural goals for what MW should be changing into. List of RFCs now are pretty great, some simple, some more complicated. Mostly narrowly focused. There's no sort of high-level vision. Question is: Is there somebody who's figuring that out (high-level vision).

Brion:

  • Glad we've started talking about arch-level stuff. Next few years, not just getting this merged. We need to to talk about both. There have been a number of complaints about code not getting reviewed. Separate from RFC and arch problems. It ties together in that we're not always as a group making sure that things are happening and moving forward. Easy to get stuck on little details. Yesterday, there was a long long discussion about HTML templating engines. Fascinating subject, ultimately it does not matter to the architecture of MW. That is an implementation detail.

Trevor: Matters if we use it

Gabriel: If we use it for content

Mike: It is an architectural issue. Specific choice may not be architectural. Could be that different choices are more or less consistent.

Brion: Need to tackle bigger questions as well. Need to tackle them in a context that is not just developers. What do our users want/need? Don't just want a scalable thing. What is the purpose of MediaWiki? What are we going to create from there?

Roan: I was going to respond to Owen. It scares me that it was last year that you made your first commit to MediaWiki. Advice I would give to Wikia: Recently, VE has been trying to identify what parts of what they've written are reusable or can be made reusable. Move it out, cut the cord. Make it a separate thing and possibly upstream it. Would highly encourage you to upstream at least the things that cause friction when upgrading. I know that's easier said than done. But I see very few commits from Wikia people in our repositories. The fact that you're a lead in the platform team at Wikia and only have 1 line of code in MW core is concerning. We should make some of your changes upstream MediaWiki core.

Ori: Seem to have been at various points agreement that architects should appoint or deputize individuals to make decisions about particular areas of MediaWiki. Impression was that architects agree to it, but it wasn't clear what's next. You might want to do that because a lot of people are gathered here. Other thing I wanted to point is that agreeing on big directions isn't a big problem. Quite a large number of areas with large agreement. Blocked on one-line changes that never get reviewed. When this topic is brought up, we decide to do more. Something naive about this position. We should be more realistist. What is missing in our assessment is that our metrics are skill and track-record blind. Oldest patch, total number. Very misleading since if you look at actuality of review, people look at track record of developer. Not necessarily unreasonable bias. Alternative often to a perfect architecture that matches what you want is often neglect. If you see someone taking initative, take a chance on the person. We can fix bugs. I want to especially say this not just to the architects, but to the people who have recently gained +2. People think they're only recent contributors, reluctant to +2. If you have +2, take chances on other peoples' code.

MZM: All foundation have +2

Antoine: Only in theory, for emergencies. Only about 10 use it. Volunteers are often better at MW, a good thing. Code review in core way better than before. A lot of backlog is things we (literally everyone, WMF, volunteers, Wikia) don't care about.

Bawolff: People lump lost good changes in backlog with bad changes.

Ori: If we start acknowledging better changes, worse changes. We should encourage people to bang on the kettle.

DJ: Don't know who to point it out to. Made a Bugzilla/Gerrit request to have a way to ask people for DB knowledge, etc.

Sumanah: Is the maintainers wiki page useful for this?

DJ: Gerrit is repo-based review. That's nice, but our repos contain so much code. Knowledge-domain based review. If you don't know who has that knowledge, it simmers. It's a real problem. (bug 35534)

Trevor: Basically agree with Ori, but one thing concerned me. This idea that great ideas are first ridiculed, then accepted, then obvious. If all of our long-term goals were just generally accepted, outdated goals. We need wild ideas that incite vigorous discussion. Service-oriented architecture: 90's man. Maybe we should have more radical set of RFCs. We need more lively discussion.

Andrew: Your job. :)

Quim: "This is an RFC system for the future of MW". We sometimes have different opinions about that future . Roadmap page: Scrum meets a month ahead? Deployment calendar: Planned releases (two weeks/a month) ahead. WMF Engineers yearly goals. First problem is that there's no long-term vision. If there is, it's dictated by WMF plans. Underlying problem in many of these bigger topics. If there's an arch team we all defer to, this team should be more active at pushing a vision, not just reading through RFCs and discussion. Or move to system of whoever swims faster, wins the race.

Aude: I'm new with +2 and all that. Busy on Wikidata. Not much time to review, but I try. If I submit it about job queue or API, know who to ask. Parts of core (EditPage). Who maintains it? Parts of core where nobody yet maintains. Lot of tech debt. So I know the platform team has some of those people. People off doing future, there could really be more people to volunteer or more staff (to work on tech debt)

RobLa: Directed at Erik, myself. Great to do features, but some areas neglected, maybe no one wants to volunteer, maybe we need staff working on unpopular code. If you look at our responsibilities, we feel like we have full plates.

Aude: So do we. Only so much we can do on the weekends, stuff I do for core is on weekends.

RobLa: Is the pont more realistic workflow for scaling down and focus?

Aude: More staff on technical debt?

Daniel Kinzler: Code review not as satisfying as writing code. Want to talk about something else. Two reasons: Enough criticism can get something stuck if there's no apparent way forward. RFC or change in Gerrit. Pretty nice, could be better. But then how are you going to figure out what to do to fix it. Within Wikidata dev, started to ignore criticism that does not show a forward path. What am I supposed to do? Have to wait, nothing will come of it. Other point is that Gerrit dashboard is just horrible. Things that are not touched for a week drop off. Unless someone rebases it (and no internal service error. :), disappears.

James F: Change the sort order. Sort by age descending. Oldest commit is at the topic. See e.g https://gerrit.wikimedia.org/r/#/projects/mediawiki/extensions/VisualEditor,dashboards/default for the kind of dashboard you can make.

Daniel: But then I no longer see the old stuff. Not only about the sorting. Only things where I'm added are on my dashboard. Should be topic-based or expertise-based.

Bawolff: Custom dashboards can be made.

Daniel: Can someone knowledgable make it or show how to do it?

Aude: Pages where you can be automatically be added as reviewers. https://www.mediawiki.org/wiki/Git/Reviewers You can also do this in gerrit's prefs: https://gerrit.wikimedia.org/r/#/settings/projects (entirely separate system)

Brion: We seem to have drifted into code review in general. We basically bring this up anytime we have a big meeting. I have to ask: Do we have anyone whose job it is to do code review, exclusively or primarily? Talking at all.

Trevor: All of us review constantly.

Brion: Have to ask, is this a problem that other teams are having?

Trevor: Used to be one person, one thing, no teams.

Matt: What is cross-review?

Trevor: People on the same team review each other's code. Culture of teaming up would help.

Christian W: Can tell you how we review at Wikia, doesn't matter. Mike S: This is not a discussion about Architecture

Brion: This is a conversation we should have, maybe this is the right place, maybe not.

Matt: Maybe separate meeting for code review, etc.

Trevor: Code review is legislating from the bench about architecture. If it doesn't get reviewed, arch does not change. People should team up and cross-review, even volunteers.

Antoine: Whenever we have people working together, we have stuff getting merged. The code that is rotting is because it's made by just one person alone. There's no way for volunteers to necessarily team up. That's not a reasonable expectation (re above/below).

Trevor: I thought it wasn't getting done.

Gilles: Might need team working on patches no one is looking at.

Krinkle: Noticed that if something is assigned, and you're doing it on your own. Two ways: Either not work on alone (third party or volunteer dev). Can only work on if someone is maintaining. Krinkle will eventually review front-end code. Can't work if someone reviews things no one asks for. Someone needs to be responsible for component.

Gilles: If that's team is job is to triage requests to right place, could work. Send it to the right project.

Krinkle: Could work, but Gerrit has a complicated interface. Krinkle has special queries, which makes it faster. For everything that doesn't get caught, triage could help.

Gilles: Sometimes people review code but don't help you go forward. Need a small set of review guidelines. Give options on moving forward. And so, at least for staff, hopefully for others, few small principles: Always help them move forward.

Sumanah: on list too.

bawolff: People say, "Don't use wgTitle", and assume you know how to get around it, or will ask. People don't always ask for help. That's kind of what happens.

Trevor: Should be as helpful as possible, but people have a way to reply.

Sumanah: Ways to be hospitable.

Trevor: People make mistakes, but tool allows two-way street. Workshops are valuable, but it's not likely it's a desperate situation.

RobLa: Code review is important, but want to go back to architecture process. People are upset, want to focus back on architecture process.

Markus: Back to RFC process, when I look at these sessions, found that commonly agree on big goal of RFC, but we have dissent on how it's implemented. What's missing is the requirements of for example HTML templating. We said we need it, quarrelled about engine, did not specify what we want. So, this would be the perfect place to gather requirements and decide what to prioritize.

Gabriel: Want to say we started to get more common ground on longer-term ideas. Might have been something that was blocking things. Didn't know which direction we would go. Bit of limbo. No point where we all came together and at least decided on some small steps. Agree with Trevor that we need crazy ideas to get longer-term picture. Still relatively concrete steps. We could have some kind of wild idea session at some summit or such. How should MW look like, and what is a wiki actually, when we have HTML?

Nik: I got in the queue when we were talking about code review. So, if we're getting back to architects, I thought about them as people we trust. When they say, "This is good", we believe them. It's neat that we have three architects with distinct domains, overlap a bit. If we feel we need more architect bandwidth, is it okay to say there's other people we trust. If they don't feel comfortable saying it for as wide an area, I'm not sure Tim would say it's good from some ops perspective. If we wanted more, could we just decide there are more and pick two or three or one, and say. Architects know their areas, new architects would know their expertise area.

Roan: had a conversation over lunch. I guess I should repeat it in front of room. We over lunch discussed having... In morning, discussed delegation. Proposal that someone threw along with that which I liked, is using delegation as a way to grow the architects. Say we have some front-end RFCs nominate Timo as delegate for those RFCs. Architects agree. We say there's a max period that we can be a delegate. After that, architects have to make up their mind. "It worked out" and they're an architect, or "it didn't work out" and they're no longer a delegate.

Antoine: Expand over time in area of responsibility. Making, say, Krinkle a junior (or Associate) architect for anything JS makes a lot of sense. Principal architect, senior architect, responsible for long-time vision.

Roan: With too many, problem.

Antoine: Architect could do only architectural work.. It is probably difficult for you to have enough bandwidth. Maybe you have enough time, not sure.

Roan: We should have less bureacracy, but at least accessible.

Bawolff: That sounds horribly beauracratic, senior, middle management, scary. Much rather kind of less format, more informal, we trust him. In other cases, we trust other people.

Yurik: There is a what, and a how. Architects decide how something gets done, but who decides what. needs to be done? What may be decided by community.

Quim: In other open source projects there is a fairly clear hierarchy, people who have commits, maintainers, project leaders. Straight-forward and discrete. How does this differ form our system (baby architect, senior, etc.) Can we have +2, maintainer, 1-2-3... project leaders or whatever name, I don't know.

Sumana: Essay on tyranny of structure-lessness.

Link to essay: http://www.jofreeman.com/joreen/tyranny.htm
Wikipedia page about essay: http://en.wikipedia.org/wiki/The_Tyranny_of_Structurelessness
summary of essay: When a group resists the idea of leaders and even discards any structure "this apparent lack of structure too often disguised an informal, unacknowledged and unaccountable leadership"

Sumana (continued): When you don't have explicit structure, end up with informal structure that people are hesitant to acknowledge, old boys' club, cliques, re-ifies dominant biases. Probably end up with informal communication channels, only talk to each other. I am in favor of some formal structures, don't want an accidental structure. Want counter-vailing forces for people who are shy in places like this. Want a better way for people who have skills that are less likely to be acknowledged.

When you thing about how people grow, in any domain, engineering, teaching, whatever. Ambition and power are not a dirty word, part of how we grow. Ambition is a combination of the urge to be a master at a domain, and the desire to have that mastery recognized. [Is this available stitched on a pillow?] - (from Anna Fels's book "Necessary Dreams") If we're going to continue to grow as a community and have ways to have people grow as engineers, testers, PMs, sysadmins, then we have to create accessible, discoverable channels for that ambition. We need to recognize that mastery. Sometimes that takes the form of if not titles, ackonlwegement, delegation, etc.

Roan: Really like Quim's suggestion. A lot of this symptom we're trying to solve, is the people we're looking for might not be junior architects. They may actually be de facto maintainers. More formal maintainers, should be an official maintainer role, gives us a way to contact someone about code, allows people to be involved in a certain domain. Difference between overall arch and people like Timo who think about a certain role. Also find where we're lacking skills.

Trevor: I agree a lot, but in reply to tyranny of structure-lessness. There's also a tyranny of tyranny. Structure that's as simple as it could possibly be, and no simpelr. We don't totally disagree, but we don't want to go, no structure is void, but we don't need to stock up on structure so fast. We should have as much structure as we need to solve problems.

Robla: Can kind of see people fading out. We have one minute left. Would like to capture some suggestions for next time. Focus on this event rather than capital-A architecture.

Greg: Is someone going to take lead in making sense of this. If we leave now, we'll not get anywhere.

Brion: I'm going to have to try and take over that. That probably means I'll throw some messages on the list, ask for comments, talk to people internally, see where we go. I really like the idea of maintainers, delegation. We really do need to think about more short-term things like making sure code is reviewed, people are mentored, if they're not pair-programming already. These are the building blocks of making things happen. Without that, won't get to bigger questions. We need to do little to support big. Will need to make big architectural decisions. If so, need to be able to execute. You people will do that actual work. Want to make sure you guys and gals are happy and productive, have the support from those in the ivory tower. Please approach us, please feel free to give comments.