Wikimedia Search Platform/Process

Overview edit

The team is working on two main products: Search and Wikidata Query Service (WDQS). Each of these products has a dedicated backlog board in Phabricator (Search, WDQS), and the team also has a Current Work board in Phabricator.

Communication edit

  • Mailing lists
  • IRC
    • #wikimedia-discovery connect
    • IRC norms:
      • Everyone on the team probably should have a bouncer (or IRCCloud, which has given us free accounts)
      • You should watch for (and get an alert when) someone mentions your nickname
      • But nobody is expected to read the scrollback from when they were away
      • IRC is great for transitory quick conversations and social chitchat
      • Anything substantive, especially related to decisions, should go on wiki or a mailing list

Recurring Meetings edit

Meeting Name Frequency Duration Purpose Attendees
Standup Twice-weekly (Tues/Thurs) async via etherpad Detailed status The whole team
Sprint Planning Weekly (Mon) 50 minutes Team wide communications, task estimation and prioritization The whole team
Triaging Weekly (Mon) 40 minutes Triage and prep of product backlog tasks The whole team

Workflow and Phabricator edit

Current Workboard structure edit

Epics: These are the epics that are currently in progress, as related to other subtasks on the board that are in progress. This should represent a high level view of the main work the team is currently focused on.

Incoming (Default): When a ticket is tagged into this board, this is where it will land. Tickets here may or may not have sufficient information to be ready for estimation.

Ready for Dev SWE or SRE/Ops: Tickets here are ready to be picked up for development, in priority order from top to bottom. There are separate columns for software engineering work vs operational work.

In Progress: Tickets here are currently in active development.

Blocked / Waiting: Tickets here are blocked by another team or other factors, or tickets that are waiting on internal, long-running tasks such as a reindex.

Needs Review: Tickets here are awaiting code review or other review.

To Be Deployed: Tickets here are ready to be deployed.

Needs Reporting: Tasks in this column are complete. They will be closed once they’ve been reported on at Scrum of Scrums or other relevant reporting venues.

Creating New Tickets edit

  • Epics should be created when quarterly planning is complete for each high level objective, as appropriate. Lower level tickets should also be created at this time to the extent that we know what they are.
    • Guillaume and Carly will start a doc with proposed epics based on the quarterly goals for each quarter. The team will have the opportunity to comment and add lower level tickets as appropriate. After a short review period, those tasks will be added to the respective backlogs.
    • Carly and Guillaume to review all epics on a monthly basis; team to review them quarterly when we do planning for the next quarter’s OKRs.
  • Anyone can (and should!) create new tickets as they come up. Make sure to place them under their appropriate epic and tag them into either the WDQS or Discovery-Search backlog. However, work shouldn’t begin until the ticket has been triaged, prioritized and estimated (unless urgent - see below).
  • Every task should have information about the why as well as detailed acceptance criteria. Try to format tasks using the user story format, even for backend tasks - it really helps to clarify both the purpose and value of the task, which will help with prioritization.
    • User Story format: As a <type of user>, I want to <goal of the ticket>, so that I can <why we’re doing this ticket>.
    • Bugs, test failures, and code cleanup are not well suited for the user story format; feel free to not use it in those cases. But if you don’t include a user story, you must include a sentence or two about the impact of the task/bug.

Triaging & Prioritizing edit

  • Weekly triage group meetings
    • We currently triage both Search and WDQS tasks in a single meeting, with the whole team.
    • The goal of these triage group meetings is to reduce the load on any one person to triage, stay in sync about priorities, and better distribute new and urgent tasks.
    • Who is attending which triage meeting should be reviewed periodically and changed based on task focus and interest.
  • At the triage meetings:
    • The triage team should review all tickets that have been added to the backlog triage column since the last meeting and move them into the appropriate column in priority order (with highest priority tickets at the top).
    • The triage team should review the top 4 tickets in each column, and any that should be worked on in the next 2 weeks should be moved to the “Incoming” column on the Current Work board.
  • Anything that comes up that’s urgent and can’t wait until the next triage meeting - feel free to just complete it if it will take you personally less than 4 hours. If it will take more, set up an urgent triage meeting or discuss over IRC/Slack with your triage group.
    • A task is likely urgent if:
      • It is a breaking bug.
      • It blocks another team’s work (not everything that's blocking another team is necessarily urgent, but blocking another team raises the urgency significantly. Use your judgement) - in a given week, all blocking tickets are either in progress or have had semi-recent communication from us.
      • It’s both pressing and important.
        • Not everything that’s pressing is important.
      • We will likely need to iterate on the definition of urgent.
  • Tasks are sorted by priority:
    • HIGH: We should work on this in the next few months, even if not right now.
    • MEDIUM: Everything else - we'll hopefully work on it this year or so.
    • LOW: We'd love to work on this, but realistically won't any time soon. May increase in priority in the future.
    • LOWEST: We probably won’t use this category much, unless a low group becomes too large, or we otherwise need to further sub-divide the low group.
    • FOR LATER: Tasks that shouldn’t be closed yet, but which we don’t expect to be high enough priority to work on for the foreseeable future. May be of higher importance to another team, and others are welcome to take ownership of them, reprioritize them, and work on them.

Estimating & Prioritizing edit

At Monday’s sprint planning meetings:

  • The team will review all tickets in the “Incoming” column and discuss whether the ticket has sufficient information to be worked on, whether the acceptance criteria is clear, and whether the task should be broken down into smaller chunks. Necessary information will be added if we have it. If not, someone will be assigned to investigate to get the necessary information (and the ticket will stay in the Incoming column until the next week).
  • The team will then estimate as a team those tickets in the Incoming column that have all necessary information:
    • The team will use the Fibonacci sequence to estimate (using Hatjitsu), and put the estimate under Estimated Story Points for the task.
    • Anything larger than a 21 should be considered a strong candidate to be broken up.
      • Ideally, tickets won’t make it to estimation that are larger than a 21 because they will have been broken down earlier in the process. If something does get estimated higher than a 21, that doesn’t mean we should lower the estimate - it means we should consider how to break down the task into smaller chunks.
    • Bugs are sometime hard to estimate. That's OK, our estimates can be wrong and we can refine them later (see below).
  • Once estimated, the ticket will be moved from the Incoming column to the Ready for Development column. By default, tickets are moved to the bottom of the Ready for Development column, unless they are urgent (bugs, UBN, etc...) in which case they go at the top.
  • Ready for Development tasks that are not going to be worked on in the next 2 weeks will be moved back to the backlog board.
  • Parent tasks (tasks that have subtasks but are not an Epic) are used to track related work. The parent child relation can indicate a dependency in either direction (the parent task is blocked on all subtasks being completed, or the child tasks are follow up on the parent task).
    • For follow up, the parent task can be closed when it is done.
    • For parent blocked on subtasks, the parent task should be moved to the Blocked column and moved to "Needs Reporting" when all subtasks are completed.

Taking Work edit

  • Unless you have taken on something urgent (4 hours of work or less, or as agreed upon by your triage team), try not to take a new ticket until you have finished all other tickets in the In Progress column that are assigned to you and you've done what you can on others’ tickets that you are able to move forward.
    • This will help with context-switching and focus, as well as promote stop starting–start finishing behavior and reduce the load on code review.
    • Helping to move other people’s tickets forward before taking a new ticket helps the team work collaboratively and have common ownership of our work. It also helps prioritize work that is most important rather than what you normally do.
    • Before moving a ticket to the In Progress column, make sure to assign it to yourself. All tickets in the In Progress column should have an owner.
    • It’s okay to have open tickets in multiple columns assigned to you. You might have one in the In Progress column and one in Waiting, for example. The goal is not to have more than two tickets assigned to you in the In Progress column at any time. This will help limit the amount of work that the team takes on at once.
    • If more work is needed after review, or if something becomes unblocked, make sure to move that ticket back to the Ready for Development column where it will be reprioritized.
  • When you’ve finished your In Progress tickets, take the top priority ticket from the Ready for Development column that is in your skill set.
    • If you don’t have anything to pick up, talk to your manager and/or program manager. This is an indication that we may not be effectively prioritizing and balancing work.
  • Make sure to periodically update each ticket. Comments with updates should be made in each ticket whenever an action is taken, such as moving the ticket to a new column, or every 3 days at a minimum. Also make sure to move the ticket to the appropriate column when it needs review, is blocked, etc.
  • If a task is taking more time than initially estimated, you should update the estimate on the ticket. Carly will monitor updated estimates and evaluate whether we need to reconsider priority, etc.

Open Questions edit

  • When do we have time to review and update our OKRs in Betterworks?
    • Currently the team goes through these some Mondays and adding updates.
      • This could happen in 1:1s.
    • Related, when and how do we review the quarterly goals to make sure our continual prioritization remains in line with them?
      • 1x a month at Monday meetings?
    • Also related, what is the process for continually updating and refining our quarterly goals for the next several quarters?
  • There are categories of interruptions/invisible work that we need to discuss in more detail. It is common to spend multiple hours per day doing code review for other teams. This does not seem like something we should prioritize weekly, but it is a significant chunk of work. How do we deal with this?
    • What about creating subtasks for our part of the work, putting them on our boards, and handling it through the same process?
  • We need to have a deeper discussion on async handover, code reviews, and signaling. For example, code reviews can spread over multiple days of back and forth. It isn't clear if we are optimizing for throughput, predictability, or delay.

Code Health / Quality edit

The Search Platform team believes in having high quality / clean code. Knowing what good code is comes with experience, having a clear, concise and exhaustive definition of what we mean by clean code is an exercise in futility. Like many other things, I know clean code when I see it. One of the best metric of clean code is WTFs per minute.

All that being said, we do have a few guidelines:

Good code should:

  • not get in the way of future developments
  • not add accidental complexity to the domain
  • be modular (unit tests are a good indirect measure of modularity)
  • follow the Single Responsibility Principle (SRP), concepts should be extracted as often as possible
  • have small methods
  • have good naming
  • be understandable by someone with no domain expertise

Code Reviews edit

Code reviews are a great tool to improve the quality of our code and to develop a common understanding of what we mean by clean code. A few guidelines for code reviews:

  • code reviews should have a high priority for everyone, moving forward someone else's code that is close to be merged is more important than moving forward your own code that isn't written yet
  • metrics are only indicators, they are useful as a discussion starter, not as absolute rules
  • linters help to have consistent code
  • code should be reviewed by someone who is not a domain or language expert, if that person can understand the code it means the code is simple enough
  • also comment on why you are NOT reviewing a patch (because you are unfamiliar with the project, you don't understand the code, ...), this is giving feedback that you did look at the code, even if you have no actionable comments on it
  • as a general rule, code should not be self merged, but in cases where only the owner is knowledgeable enough to take responsibility, self merging after a "+1" review is OK