Wikimedia Technical Talks

Technical talks are video presentations created by and for members of the Wikimedia technical community. Technical talks cover a wide range of technical concepts and ideas: From how a technology or process works, to how to perform a specific task, to lessons learned in a project. They help make it easier to contribute to Wikimedia projects.

Technical Talks took place from 2014 to 2020.

Archive

Writing PHP unit tests for MediaWiki

December 07, 2020 at 20:00 UTC

YouTube video stream: YouTube

Slides: TBA

Speaker: Kosta Harlan

Topic Areas: Technology

Description: This talk covers tooling, tips, and best practices.

Learn more here: phab:phame/post/view/169/changes_and_improvements_to_phpunit_testing_in_mediawiki/

(Modern) Event (Data) Platform

September 23, 2020 at 17:00 UTC

YouTube video stream: YouTube

Slides: Wikimedia Commons

Speaker: Andrew Otto

Topic Areas: Technology

Description: Wikimedia's Event (Data) Platform provides a foundation for building loosely coupled event-driven software systems. This talk goes over why we built Event Platform, and give an overview of its components and how they work.

Notes During the talk we mentioned Architecture office hours. If you are interested in participating, send an email to architecture wikimedia.org.

openZIM/Kiwix ETL toolchain for Wikipedia dumping

August 26, 2020 at 17:00 UTC

YouTube video stream: YouTube

Slides: Wikimedia Commons

Speaker: Emmanuel Engelhart / Kelson

Topic Areas: Technology

Description: Summary of the talk: Enjoying Wikipedia offline wherever, whenever is easy with Kiwix. But behind the scenes, a bunch of tools are needed to make it work. From article selection to dump publishing through scraping, optimisation and packaging: here is a quick overview of how we do it.

Retargeting extensions to work with Parsoid

August 12, 2020 at 17:00 UTC

YouTube video stream: YouTube

Slides: Wikimedia Commons

Speaker: Subramanya Sastry

Topic Areas: Technology

Description: The Parsing team is aiming to replace the core wikitext parser with Parsoid for Wikimedia wikis sometime late next year. Parsoid models and processes wikitext quite differently from the core parser (all that Parsoid guarantees is that the rendering is largely identical, not the specific process of generating the rendering). So, that does mean that extensions that extend the behavior of the parser will need to adapt to work with Parsoid instead to provide similar functionality [1]. With that in mind, we have been working to more clearly specify how extensions need to adapt to the Parsoid regime. At a high level, here are the questions we needed to answer:

How do extensions "hook" into Parsoid?

When the registered hook listeners are invoked by Parsoid, how do they process any wikitext they need to process? How is the extension's output assimilated into the page output?

Broadly, the (highly simplified) answers are as follows: Extensions now need to think in terms of transformations (convert this to that) instead of events (at this point in the pipeline, call this listener). So, more transformation hooks, and less parsing-event hooks.

Parsoid provides all registered listeners with a ParsoidExtensionAPI object to interact with it which extensions can use to process wikitext.

The output is treated as a "fully-processed" page/DOM fragment. It is appropriately decorated with additional markup and slotted into place into the page. Extensions need not make any special efforts (aka strip state) to protect it from the parsing pipeline. In this talk, we go over the draft Parsoid API for extensions [2] and the kind of changes that would need to be made. While in this initial stage, we are primarily targeting extensions that are deployed on the Wikimedia wikis, eventually, all MediaWiki extensions that use parser hooks or use the "parser API" to process wikitext will need to change. We hope to use this talk to reach out to MediaWiki extension developers and get feedback about the draft API so we can refine it appropriately.

[1] https://phabricator.wikimedia.org/T258838

[2] https://www.mediawiki.org/wiki/Parsoid/Extension_API

Beyond Wikipedia - Knowledge that even a computer can understand

July 22, 2020 at 17:00 UTC

YouTube video stream: YouTube

Slides: Google Slides

Speaker: Zbyszko Papierski

Topic Areas: Technology

Description: Everybody knows what Wikipedia is, right? This magnificent source of knowledge has been helping countless people with their everyday lives for nearly two decades. Whether you want to know how to calculate the circumference of the circle, whether hyenas are pack animals or what really happened to the Ottoman Empire - Wikipedia’s got your back.

Well, unless you happen to be a computer.

One issue with Wikipedia is that knowledge there isn’t very well structured. There are links to other pages, sure - but unless you actually understand the text, you won’t understand what the link actually is. This is, of course, a field day for AI/ML experts - and there are a lot of people already scavenging Wikipedia for any meaningful relations. Fortunately, this is not the only way.

Enter Wikidata - Wikipedia’s younger sister. Wikidata is also a source of knowledge curated and provided by a community of volunteers, but presented in a relational graph format. Structuring the knowledge has huge ramifications - it not only makes it easier to digest by software, but also allows you to infer new knowledge.

There are different ways for developers to interact with Wikidata, but we’ll focus on Wikidata Query Service - a service my team is responsible for. It provides a queryable interface - using an RDF graph language called SPARQL (not to be confused with a hundred other things in IT with “spark” in the name).

Let’s do some discovery!

API portal and gateway project

June 05, 2020 at 17:00 UTC

YouTube video stream: YouTube

Slides: Wikimedia Commons

Speaker: Evan Prodromou

Topic Areas: API, technology

Description: How does Wikimedia become "the essential infrastructure in the ecosystem of free knowledge"? One way is by making a platform that helps software developers become successful. In this talk, Evan Prodromou, Product Manager for APIs in the Platform Team, discusses the ongoing work to provide a Wikimedia developer platform. With this platform, app creators can include Wikimedia data and content into their software in new and emergent ways. From modernizing our API paradigm, through unified user authorization, documentation, and developer onboarding, the Platform team is working to make a developer experience that rivals those from other major Internet players.

Links

The basics of cryptography using OpenPGP and GnuPG

April 29, 2020 at 17:00 UTC

YouTube video stream: YouTube

Slides: TBA

Speaker: Lars Wirzenius

Topic Areas: Technology, structured data

Description: OpenPGP is the prevalent standard for cryptography for secure software distribution and GnuPG is its prevalent open source implementation. This talk introduces things at a conceptual level: what cryptography is for, why is it useful, and the basic use of GnuPG by creating cryptographic keys, using digital signatures, and encryption. No previous experience with GnuPG or OpenPGP is needed, but all examples use the Linux command line.

Understanding Wikimedia Maps and its challenges

March 25 2020, at 18:00 UTC

YouTube video stream: YouTube

Slides: Wikimedia Commons

Speaker: Mateus Santos, Software Engineer

Topic Areas: Maps, product, Site reliability

Description: The WMF Product Infrastructure Team has been maintaining the Wikimedia Maps service for the last year and a half with help from SRE. This talk shares the challenges and work of creating a better development environment to enhance productivity, solve technical debt and keep up with platform modernization.

Data and Decision Science at Wikimedia

February 26, 2020 at 18:00 UTC

YouTube video stream: YouTube

Slides: TBA

Speaker: Kate Zimmerman, Head of Product Analytics at Wikimedia

Topic Areas: Technology, data, data visualization

Description:

How do teams at the Foundation use data to inform decisions? Sarah Rodlund talks with Kate Zimmerman, Head of Product Analytics at Wikimedia, about what sorts of data her team uses and how insights from their analysis have shaped product decisions.

Kate Zimmerman holds an MS in Psychology & Behavioral Decision Research from Carnegie Mellon University and has over 15 years of experience in quantitative and experimental methods. Before joining Wikimedia, she built data teams from scratch at ModCloth and SmugMug, evolving their data capabilities from basic reports to strategic analysis, automated dashboards, and advanced modeling.

Links mentioned in talk:

Structured data on Commons

December 11, 2019 at 18:00 UTC

YouTube video stream: YouTube

Slides: TBA

Speaker: Cormac Parle, Software Engineer

Topic Areas: Technology, structured data

Description:

The talk covers Structured Data on Commons:

  • what structured is
  • the structured data we store for a media file on commons
  • where we store it
  • how it helps with search
  • the UI and the API calls we use to manipulate it

Wikidata, behind the curtain

November 20, 2019 at 19:00 UTC, 45 Minutes

YouTube video stream: YouTube

Slides: Wikimedia Commons

Speaker: Amir Sarabadani, Software Engineer

Topic Areas: Technology, Wikidata

Description:

Wikidata is a complex and large-scale project. We all know how to use it and how to contribute to it but it's a little bit hard to understand how it actually works, how it scales and what parts are tricky about it. To lots of developers, it's a black box and this is not good. This talk plans to explain internals of Wikidata to other developers and explain future changes to Wikidata on its technical layer.

ResourceLoader tips and tricks

October 23, 2019 at 45 Minutes

YouTube video stream: YouTube

Slides: TBA

Speaker: Roan Kattouw, Principle Software Engineer

Topic Areas: Technology

Description:

Did you know that you could require() files in JavaScript? That you could make your own icon modules with 10 lines of code? That there's a new way to export configuration variables to JavaScript?

Learn about new ResourceLoader features introduced this year, and how you can use them to improve your code. We'll start with a quick introduction to ResourceLoader, then dive into some of the advanced features like require(), config var bundling, generated JSON files and icon modules.

How to compare text across multiple languages

September 25, 2019 at 18:00 UTC, 45 Minutes

YouTube video stream: YouTube

Slides: TBA

Speaker: Diego Saez-Trumper, Research Scientist

Topic Areas: Technology, languages

Description:

This talk explains how cross-lingual word embeddings works, and how they can be used to measure the semantic distance between words and documents across different languages, as well of showing some use cases in our section and template alignment work.

Documenting Wikimedia technical projects

September 04, 2019 at 18:00 UTC, 45 Minutes

YouTube video stream: YouTube

Slides: TBA

Speaker: Sarah R. Rodlund

Topic Areas: Technology, technical writing, technical documentation, Toolforge, Wikimedia Cloud Services

Description:

This talk discusses what technical writers do, and why they are critical members of our technical community. You learn more about the skills needed to be a technical writer and how to build these skills by participating on Wikimedia and other open source projects.

The talk also covers some ongoing initiatives to improve technical documentation for Wikimedia projects.

A Deployment Pipeline Overview

July 10, 2019 at 16:00 UTC, 45 Minutes

YouTube video stream: YouTube

Slides: Wikimedia Commons

Speaker: Alexandros Kosiaris

Topic Areas: Technology, Deployment, Mediawiki

Description:

The deployment pipeline project has been ongoing for a while, sometimes with more resources poured into it, sometimes less, but it's finally in a state that is ready to be used (it's already being used!). This tech talk is about a presentation to wider technical audiences, discussing the goals of the project, the implementation decisions and how it's meant to be used and adopted by the deployers of services (and eventually MediaWiki) in the coming months.

Just what is Analytics doing back there?

June 25, 2019 at 18:00 UTC, 45 Minutes

YouTube video stream: YouTube

Slides: Wikimedia Commons

Speaker: Dan Andreescu

Topic Area: Data flow, Analytics Infrastructure

Description:

We take care of twelve systems. Data flows through them to answer the many questions that our community and staff have about our piece of the open knowledge movement. Let's take a look at how these systems fit together to answer questions. Let's also look at an example trick we use to join big data in a distributed world.

Wikimedia and W3C

May 23, 2019 at 15:00 UTC, 45 Minutes

YouTube video stream: YouTube

Slides: Wikimedia Commons

Speaker: Evan Prodromou and Gilles Dubuc

Topic Area: Standards

Description:

The Wikimedia Foundation is now a member of the W3C, as of April. We walk you through how you can join working groups, what to expect of W3C participation, what we hope Wikimedia staff can achieve through W3C and we share our own experiences as W3C members.

Sharing global opportunities for new developers in the Wikipedia community

April 24, 2019 at 18:00 UTC, 45 Minutes

YouTube video stream: YouTube

Slides: Wikimedia Commons

Speaker: Srishti Sethi, Developer Advocate, Wikimedia Foundation

Topic Area: Developer Advocacy, onboarding new technical contributors

Description:

Wikimedia offers a plethora of opportunities for newcomers to get involved; however, as with many other free software projects, getting involved with the Wikimedia technical community can be a daunting prospect for newcomers. This talk is a gentle introduction to the Wikimedia ecosystem, and gives pointers on how to get involved as a volunteer. We delve into the various ways newcomers can make successful contributions in areas ranging from design to documentation, from programming to testing, and much more.

Ouch, I have an OOUI: using OOUI without pain

March 27, 2019 at 18:00 UTC, 45 Minutes

YouTube video stream: YouTube

Slides: TBA

Speaker: Moriel Schottlender

Topic area: OOUI

Description: OOUI is the interface widget library we are using for UI in the Wikimedia projects. The library is meant to allow implementers to create useful interfaces that automatically answer internationalized needs that are unique to the global nature of our projects. Right-to-left support, supporting old browsers, accessibility, etc, are things that OOUI is doing in the background for you. This tech talk presents OOUI’s history, basic and advanced usage, and demonstrate how to create great interfaces without (much) pain within our wiki ecosystem.

Links mentioned in the talk:

The long and winding road to making Parsoid the default MediaWiki parser

February 27, 2019 at 19:00 UTC, 45 Minutes

YouTube video stream: YouTube

Slides: Wikimedia Commons

Speaker: Subbu Sastry, Principal Software Engineer

Topic area: Parsoid, Wikitext Parsing

Description:

This talk has two parts: The first part provides a bunch of background to make sense of the roadmap presented in part 2. The second part has 3 components: (a) Parsoid history (b) Porting Parsoid to PHP: the whys and wherefores (c) From here to Parsoid as the default.

Parsoid started in 2012 as a project to support Visual Editing and since then has gone on to support a number of products (Flow, Content Translation, Kiwix, and Android app). Given that (a) Parsoid's annotated HTML output enables clients to infer things about wikitext without having to parse wikitext, (b) the PHP parser cannot support Visual Editor and other products, and (c) we cannot continue to have two parsers, it is inevitable that Parsoid will be the default parser for MediaWiki. This has been known since at least 2015 but while we are nearer to that goalpost, we are still not quite there yet.

In this talk, we'll talk about what else needs to be completed, and what the porting of Parsoid to PHP means for this goal.

Older tech talks

You can browse through past tech talk recordings in the Commons category and on the MediaWiki YouTube channel.

Other showcases and presentations