Readers/Structured Content

The Structured Content team in the Wikimedia Product department at the Wikimedia Foundation that focuses on improving media quality on Wikimedia Commons, which includes improving the upload process (such as UploadWizard improvements), detecting potentially problematic uploads and improving media metadata.

It previously focused on building features that use and allow creation of structured data associated with MediaWiki pages.

Projects

To see recent notable changes and releases by this team, visit the Release Timeline.

The team is currently focusing on improving the current user experience with UploadWizard and developing a tool to automatically detect logos when uploaded on Commons.

The team was previously focused on (from August 2017 to Dec 2019) on Structured Data on Commons, and then (from Feb 2021 to June 2023) on the Structured Data Across Wikimedia (SDAW) grant-funded program.

Projects in the SDAW program include:

We mostly work on the following extensions:

Inactive and semi-active projects

Some members of the team were involved in development on these projects (when we were called the Multimedia team). They are not actively involved at the moment, but might be able to help if you have a question:

Team Documentation

Team process/working agreements

  • We deliver features incrementally.
  • If the acceptance criteria of a phab ticket change (for example as a result of discussion in the ticket's comments), the ticket description is expected to be updated to reflect the change.

Phabricator Boards

  • The Structured Data Backlog board is our primary backlog. It is managed by the Product Manager and Program Manager.
  • The Current Work board is for in-progress development. It is managed by the team as a whole.
    • Each column on the Current Work board should be in order of priority, from most important at the top to least important at the bottom. Tickets should automatically be added to the bottom of a column unless it is urgent or high priority, when it can be added to the top.
    • Columns on the Current Work board include:
      • Incoming: Where tickets automatically when added to this board.
      • Epics: Reflective of the big projects the team is currently working on.
      • Needs Design: Tickets that require input from the designer.
      • Needs CL Input: Tickets that require input from the Community Liaison.
      • Ready for Estimation: Tickets that require estimation. Tickets are moved to the Ready for Estimation column when their design and acceptance criteria is complete and they are ready for work.
      • Ready for Development: Tickets that are ready to be worked.
      • Blocked: Tickets that are blocked, from outside the team or until other work is complete.
      • Doing: Tickets that are currently in progress.
      • Design QA: Tickets that require approval from the designer.
      • Code Review: Tickets that require code review.
      • Needs QA: Tickets that require QA in beta.
      • Verify on Production: Tickets that require QA in production.
      • Deployment/Config: Tickets that require deployment outside of the regular train schedule.

Daily Work

  • Each day, within the columns relevant to you, and for the tickets you feel qualified working on:
    • Work through columns from right to left:
      • insofar relevant, validate things in “Needs QA”;
      • then review tickets in  “Code review”;
      • then continue whatever you already have in “Doing”;
      • then verify whether tickets in “blocked” are, in fact, still blocked;
      • then pick up new work from the top of “Ready for Development”.
  • Within each column, work from top to bottom, picking up the first (highest priority) ticket you feel qualified to work on.
  • Update all tickets as soon as you do something that changes the state of that work, e.g:
    • As soon as you pick up work, assign yourself & move into the relevant column
    • When merging a patch, move the relevant ticket out of “Code review”
    • If you do anything that is not 100% self-explanatory (and actions rarely are) then a comment on the ticket should accompany every action taken.
  • When patches are big or risky, engineers should alert QA and have them review both before and right after merging.
    • If QA doesn't respond or is too busy to review, don't merge. If it's urgent, alert program management or engineer management to help find a solution.
  • Ensure there is a clear testing script methodology for engineers to run through before all stages: code review, deploy, QA.
    • Before a patch is open for review, add a QA section with acceptance criteria and tests that should be run.
  • Prioritize code review before new work.

Regular Communication

  • When appropriate, have conversations in a visible Slack channel (e.g. #sd-eng or #structured-data), so that others can follow along if they’re interested.
  • Any outcomes or decisions from conversations that impact the team, either publicly on Slack or in smaller venues, should be shared with the team publicly and documented in the relevant Phabricator ticket.
  • Standup (every day except Friday):
    • Every team member posts standup notes in the #sd-standup slack channel. Standup should include updates about your work that are relevant to other team members, but do not need to include everything you did that day.
    • In addition to stating what you’re working on:
      • Share any relevant information you learned about, that can be delivered in a short update without too much context. E.g.:
        • “Team X has done Y, which should help us for Z”
        • “Deployment didn’t go through so our feature is not yet live”
        • “I attended X meeting, and Y was the outcome”
    • Post any big interruptions as soon as you are able to communicate it, and once again when it actually happens (e.g. vacation, on another project for a good part of your time, parental leave).
Slack Channels
  • Channel Name: #sc-eng
    • Privacy: Private
    • Purpose: A place for SD engineers to discuss structured data team technical implementation.
    • Audience: Mainly be SD engineers and QA. Others can watch for relevant discussions but should mainly engage with engineers in the main #structured-data channel.
  • Channel Name: #sc-search
    • Privacy: Private
    • Purpose: A channel for discussing the Search Improvements project, and all work and topics shared between the Structured Data and Search teams.
    • Audience: SD & Search engineers, QA, PM, PgM, designers, data analysts, and Community Relations Specialists; as well as directors in those areas who are actively involved with the teams.
  • Channel Name: #sc-standup
    • Privacy: Private
    • Purpose: A place for all members of the SD team to post daily updates about their work that is relevant to other team members.
    • Audience: SD team engineers, QA, PM, PgM, designer, data analyst, and CRS.
  • Channel Name: #structured-content
    • Privacy: Private
    • Purpose: General conversation related to the Structured Data team and its work.
    • Audience: SD engineers, QA, PM, PgM, designer, data analyst, and CRS; as well as directors in those areas who are actively involved with the team.
  • Channel Name: #image-suggestions-and-sections
    • Privacy: Open to WMF internally
    • Purpose: A cross team channel to discuss the image suggestions and section topics projects.
    • Audience: Anyone at WMF who is interested can join, but it should mainly be folks from the teams working on section topics and image suggestions: SD, Search, Growth, Android, and PET.

Creating New Tickets

  • All members of the team can create new tickets at any time. There are templates in the menu of the backlog to assist with this.
  • New tickets should be tagged with the #structured-data-backlog tag, which will put it in the “needs triage” column of the backlog, to be attended to at backlog grooming.
  • New tickets should contain a user story, as much description as possible, and as much acceptance criteria as possible. Anything you don’t know at the time can be added later, but should be added before estimation.
  • If a ticket is urgent, it can be added directly to the current work board to the appropriate column, but the team should be alerted via Slack.
  • Spin-off tickets that are essentially part of another ticket (that has already been groomed) but are tracked separately for technical reasons (e.g. requires changes across multiple extensions) can also be added directly to the current work board in the appropriate column. However, if it is new work, it should be added to the backlog and put through the process. Spin-off tickets should not require estimation, since theoretically the work is part of another ticket that already has an estimate.

Meetings

  • Backlog Grooming (every Monday):
    • Attended by the tech lead, designer, QA (optional), PgM, and PM.
    • Review each column on the current work board, from right to left, to see if anything needs to change based on circumstances or priorities. If so, move the ticket and inform the team.
    • Review the “needs triage” column on the backlog.
      • Ensure that each ticket has the necessary detail and full acceptance criteria.
      • Discuss the priority of each ticket, and move each ticket to the appropriate column in the backlog, or to “Needs Estimation” on the Current Work board, in priority order.
    • Discussions outside of backlog grooming tasks should not be held in this meeting, because it decreases visibility to the team.
  • Project Meeting (every other Tuesday):
    • The entire team attends this meeting.
    • This meeting is used to build shared understanding, and is dedicated to catch all of us up on where we are right now and what we expect the short-term future to look like. This is a great venue to raise questions or issues before they become blockers.
      • Other topics at this meeting can include sharing designs, discussing individual decisions or ideas, or ironing out acceptance criteria on tickets that require discussion.
      • Anyone can add topics to discuss at this meeting in the agenda document linked in the calendar invite.
  • Retrospective (every other Tuesday):
    • The entire team attends this meeting.
    • This meeting is used to reveal kudos and facts or feelings that impact the team, and discuss any proposed improvements or solutions. The retro should focus on actionable change, which means sometimes deeper conversations should happen after.
  • Estimation Meeting (every other Wednesday):
    • This meeting is used to estimate tickets in the “Ready for Estimation” column.
    • The team estimates the tickets in the "Ready for Estimation" column using t-shirt sizing.
    • Once estimated, tickets are moved to the Ready for Development column. By default, tickets are moved to the bottom of the Ready for Development column, unless they are urgent or high priority, in which case they go at the top.
    • If something needs to be estimated more urgently than every two weeks, we can hold an async estimation on Slack in the #structureddata channel.
  • Office Hours (every other Wednesday):
    • This meeting is attended by team engineers. Others are also welcome to join, if they want to discuss or learn about technical topics.
    • The purpose of this meeting is to have a venue to discuss technical topics.

Code review

As a general guideline, developers ought to start the day with up an hour of code review.

When something is ready for review, add other devs as reviewers. If something is not ready for review, you ought to indicate that with a [WIP] tag or a -1

Chores

Vagrant Development Environment Setup

Currently the structured data team uses mediawiki-vagrant for the majority of its work.

M1

git clone git@github.com:matthiasmullie/mediawiki-vagrant.git mw-vagrant
cd mw-vagrant
./setup.sh
vagrant config vagrant_ram 8192
vagrant up


# eventgate (installed below as dependency for another role) requires another node version than what mw-vagrant ships with
vagrant hiera npm::node_version 10 && vagrant provision


# this ends up pulling in eventgate which has some deps that (can) fail to compile; it all works, but the failure could impact other roles when provisioned simultaneously
vagrant roles enable cirrussearch --provision


# uploadwizard (titleblacklist, actually, one of its reps) fails to install after wikibase, so let's do this first
vagrant roles enable uploadwizard --provision


# centralauth fails to install schema, which we'll do manually before other roles start to fail... vagrant roles enable centralauth --provision
# note, looks like that failure is no longer true, and below no longer needed
# vagrant ssh
# cat /vagrant/mediawiki/extensions/CentralAuth/schema/mysql/tables-generated.sql | sql centralauth
# exit # vagrant


# First going to install wikidata and mediainfo
vagrant roles enable commons mediainfo wikibase_repo wikidata --provision


# install wb_items_per_site for all wikis
vagrant ssh
sed -i -i 's/, ips_site_page/, ips_site_page(200)/'
/vagrant/mediawiki/extensions/Wikibase/repo/sql/mysql/wb_items_per_site.sql
/usr/local/bin/foreachwiki maintenance/patchSql.php
/vagrant/mediawiki/extensions/Wikibase/repo/sql/mysql/wb_items_per_site.sql
sed -i -i 's/, ips_site_page(200)/, ips_site_page/'
/vagrant/mediawiki/extensions/Wikibase/repo/sql/mysql/wb_items_per_site.sql
exit # vagrant


# create account on commonswiki; login to that account on wikidatawiki; then grant sysop permissions via maintenance script
vagrant ssh
mwscript createAndPromote.php --wiki=commonswiki --sysop --force USERNAME password
mwscript createAndPromote.php --wiki=wikidatawiki --sysop --force USERNAME password
exit # vagrant


# Install the remainder of the roles
vagrant roles enable commonsmetadata echo eventlogging kartographer mediasearch mobilefrontend multimediaviewer uls wikibasecirrussearch wikimediaevents --provision

Intel Mac