Wikimedia Developer Summit/2017/Bringing "Enterprise MediaWiki" to the Wikimedia Foundation

Session recording

Session Overview edit

Title: Bringing "Enterprise_MediaWiki" to the Wikimedia Foundation

Day & Time: Monday, 4:20 PM-5:30 PM

Room: Hawthorne

Phabricator Task Link:https://phabricator.wikimedia.org/T149612

Facilitator(s): Yaron Koren

Note-Taker(s): Juan Lara, Tom Fellows

Remote Moderator:

Advocate: 

Session Summary edit

Detailed Summary edit

Detailed Session Notes

Purpose edit

  • Describe the extensions that make up Enterprise MediaWiki
  • Show how WikiMedia Foundation wikis can make use of Enterprise MediaWiki
  • Discuss how to increase adoption of Enterprise MediaWiki by WMF

Agenda edit

  • 10 minutes: Introduction
  • 25 minutes: Discuss problems
  • 30 minutes: Discuss potential solutions to problems
  • 15 minutes: Conclusion

Style edit

Education: teaching people about an agreed solution

Discussion Topics edit

Enterprise MediaWiki extensions is widely used, well-supported, and can improve WMF wikis by replacing existing extensions.

  • Introduction by Yaron



    *What is Enterprise Mediawiki?

    Not Corporate usage but the useful software. It's a group of useful extensions.

    *Idea: Enterprise MediaWiki would benefit MediaWiki. Already in use in some places. Example, Wiki Apiary.


    History of Enterprise MediaWiki:

    2007:Semantic Forms: Template-based tagging

    2010: First SMW conference, new path for SMW. More of it's own project.

    2012: Wikidata launched

    * Good: Single point of data entry as opposed to SMW's naive tagging.

    2015: Cargo-> lightweight alternative to SMW

    2016: First EMWCon. These Template Data extensions are at the heart of Enterprise MediaWiki.

    * Semantic Forms renamed to Page Forms.


    Semanti MediaWiki vs Cargo

    * SMW is 10 years older and in wide use

    * Nothing wrong with SMW but Cargo superior

    - Easier to install

    - Easier to set-up data structures

    - Easier for developers because code-base is smaller

    - More powerful

    * Cargo's approach:

    * Instead of SMW Triples, Cargo uses stanard tables in each call

    * Cargo uses its own parser functions.

    * You put everything into one template to declare a table. Everytime you call this template, add a row to a table.

    * Cargo Queries similar to MYSQL queries.

    * Like SMW, various ways to display data: Calendars, timelines, Maps.

    * Drill-down browsing, exportable data

    * Security? What about SELECT call in Cargo Queries?

    - Cargo has code to prevent SQL injection. You can also use a seperate database for Cargo data.


    * Page Forms Overview

    * Framework for defining forms to edit based on templates.

    * Lots of functionality for creating forms: auto-completion, date-picker

    * Form syntax is wiki-text-like. You can set parameters to control each form field.


    DEMOS:

    * Cargo Exmaples

    * Tables generated from Cargo Queries.

    * Outline of data where diffent levels have different relationships

    * Dynamic Table: Sort, In-table Search, Pagination

    * Gallery of user images

    * Calendar

    * Events Timeline

    * No good map examples for Cargo yet. Cargo can use Google maps or open layers. Examples-> Cities that match criteria

    * Template format based on Cargo

    * Drilldown: Filter based on full-text search of other pre-determined values.  Can be useful for any data set.


    *Page Forms Examples:

    * "Edit with Form" brings up a form with additional editing functionality.

    * Add additional rows to a table. You can configure fields so that selecting an option in one menu narrows down the options in the next option.

    * Free text fields.

    * Same wiki inputs: Preview, Save

    * Example of what form definitions look like.

    * Looks like wiki text with additional tagging.

    * PageForms parses wiki text to creat forms on the fly.


    Question: Searched sugar and different boxes and got different results:

    A: Searches may have been looking through different categories. Queries uses basic MYSQL search. Demo wiki uses same basic search.


    What could this extensions be used for by WikiMedia Foundation:


    * Storing information for infobox would be greatly improved by PageForms

    * Easier to store and edit


    * Low Findability on mediawiki.org

    * Example: What are the anti-spam extensions that work with MediaWiki 1.26?

    Using Cargo with Drilldown would improve this search.

    * In turn better findability will encourage people to expand information

    * Storing events in calendars

    * Currently no calendar of all MediaWiki events

    * Would be fairly trivial with Page Forms + Cargo: One form, one template, and one or more calendar pages

    * Event and Conference management

    * Example, submitting an event for wikimania. Currently lots of copy and paste.

    * Could also  be used for summit like this one. Phabricator kind of clunky.


    * Replacing Custom Development

    * Ex. Library Card Platform -> request temporary access to information from places with limited viewership

    * Project was supposed to be finished 1 year ago but not fully done.

    * Creating and maintainng usable software from scratch is very difficult.

    * Yaron worked on PageForms for 10 years.

    * Cargo could've implemented feature in a few months.


    Other Useful Extensions:

    * Widgets

    * Approved Revs

    * Lightweight alternative to flag-revs.

    * Simplar interface and no quality assessment.

    * External Data

    * Can get data from various sources like APIs on the web and local files.

    * Display that data on the screen.


    Why Not Use Wikidata?

    * Wikidata is great but it is supposed to be a repository.

    * No guarantee Wikidata will accept your informaiton.

    * Not as user friendly as Cargo.

    * Need to know SPARQL to query data.


    Why not use Wikibase?

    * Data acceptance solved but other problems still there


    Question: I used SMW for ISO-9000 documentation. Worked really well but ran into some problems:

    * Security to protect individual pages, separate wikis is a pain. People did not like that every page was editable and viewable.

    * Reviewing and controlling viewership

    * Multiple Wikis: External data can be used to read information from other wikis


    Answer:  Example-> Approved revisions can control what content gets seen.


    Question: Do you have any examples of Workflow uses? Example: I want to see all the steps in a process.


    Answer: Workflow could use attention. Extension has basic workflow right now with notifications. No generic workflow right now. Big pain on wikipedia right now too, for example, nominating a page for deletion. It's hard to have a generic solution for workflow that's not complicated to develop and configure.


    Requests for workflow have also been very specific. There may not be enough requests for generic workflows that require their own scriptinging.

    Question: Is the next logical step to take this to the WMF wikis?


    Answer: That's a question I have for the audience. What is the request process? Who are the decision makers? Sidenote:Cargo's performance is superior to semantic MediaWiki.


    Comment: How many people are interested in having one of these extensions on wikipedia? *Hands raised by several*

    Next step is to put something up on phabricator.

    Comment: Better to have specific improvements in mind, example, improve informaiton. I want to know what the improvements will be and not install because it's cool.

    Comment:  The barrier for install on some wikis is very low.

    Comment: Cargo could've been used for wishlist.

    Comment: My experience: extensions installed because people try them out and see what happens. Then other people install the extension. It's compelling that you can replace other extensions and get more use cases.


    Comment: So whose going to start the phabricator case?


    Yaron: For peopl who didn't raise their hands, what are the objections? I know there are objections but I rarely hear them.

    Answers: Reluctance to mix data and content.

    Response-> It depends on the usecase of wiki. For example, extensions is crying out for structured data. Use cases for semantic mediawikis is that you want to build up a corpus of information that you want display on a page.

    Objection-> How should the data on a page be stored? Do we want  separation of data and content.

    Response-> In PageForms, you can make sure non-admins can only edit with the form.


    Ex. ProcessWiki uses pageforms to manage processes. Easy to create within a few hours.


    Yaron, Final Point: These exention may not be useful for workflow on Wikipedia, but there are bad set ups on wikis and enterprise mediawiki would be a good solution to improve data entry and findability.



    Chronology: [Capture the gist of who said what, in what order. A transcript isn't necessary, but it's useful to capture the important points made by speakers as they happen]

    *SMW - Extension developed in 2005, original vision was a way to tag data in Wikipedia to export to the semantic web

    *Semantically tag data, and the query it using Wikitext

    *Semantic Forms released in 2007 - philosophy of it caught on, instead of storing data with freeform tagging, you should put all of the data within templates

    *SMW + SF used on hundreds of active wikis

    *SMW shifted form Wikipedia usage focus to Enterprise focus

    *2012: Wikidata launched, again founded by Denny and Markus

    **Single point of data entry, query across all languages vs. SMW of single-wiki data storage/querying

    *2015: Cargo - lightweight alternative to SMW

    *2016: Enterprise Mediawiki conference

    **March 8-10 in Washington DC

    *Renamed Semantic Forms to Page Forms - works with other extensions or stand alone


    Cargo's approach - data is not stored as triples but in standard tables

    Cargo parser functions - declaring tables, store data (insert into), query (select)

    Put everything in one template, and every time template is called add one row to this table.


    SQL select statement - specify which tables, conditions, etc. Joins, cities in Europe, where the country is in turn defined as being in Europe


    There are also a number of ways to display the data.  Date-based visualization, calendars, maps, etc.


    Drill-down browsing (at Special:Drilldown) - interface is generated automatically from the data


    Export the data via CSV, JSON, or XML


    Security? Cargo has a lot of code to prevent SQL injection, but you could also set Cargo to use a separate DB


    Page Forms: Yaron as the main author, but a ton of other contributors.

    A framework for defining forms to edit template calls.


    Edit with Form tab next to the Edit tab on every page, lots of input types.


    Wikitext-like form syntax to define forms.


    DEMOS:

    *Examples of queries with table result formats

    *Outline of datas, different levels have different relationships

    *Dynamic tables - sort, show different amounts of values per page, from one page to the next, search within

    *Gallery example - user images displayed

    *Calendar example - showing opinion items from newspapers and magazines

    *Timeline of events

    *Map example - can use Google Maps or OpenLayers, cities that match different criteria - pulling from coordinates associated with it

    *Template format - create a custom result format for each item, because each one has a quote, uses the template to show it in a nice way

    *Drilldown - full text search, and filter on various values. Once I click on a year, it automatically sets time period to Months.

    *Page Forms: Edit with Form - autocomplete combobox, Date/Time, text, textarea, and for more complex data you can have multiple tables of data within the pages, and Free Text at the end that supports wikitext


    WMF context - what can these extensions be used for?

    *Storing Infobox data - for extensions, skins, and more on Mediawiki.org

    *What are the anti-spam extensions that work with MediaWiki 1.26?

    **This is hard to find, having Cargo installed would make it much faster

    *Would encourage people to expand and mantain the infobox data.

    *A year ago, there is no single calendar display of all Wikimedia or MW related events

    *This would be fairly trivial to accomplish with these extensions - create the form, have a template for events, as people add in the data - have one or more pages to display that information

    *Can have calendars displaying different criteria


    Replacing custom development

    *Wikipedia LIbrary Card Platform

    **Create a site where users can request temp access to certain private online info sources, and where admins can accept/reject the requests

    *1 year behind schedule

    *MW+Cargo+Page Forms+Custom code could have done in a few months


    Widgets

    *Bits of HTML and Javascript - a lot like LUA modules but only editable by Admins

    *Predefined, potentially MW sites, or Youtube Videos, or Like buttons, etc.


    Approved Revs

    *Lightweight alt to FlaggedRevs (AKA Pending Changes)

    *Much simpler interface - no quality assessment

    *May be useful for smaller wikis like mediawiki.org


    External Data

    *Can get data from web-based APIs, databases, local files, etc.

    *Get data from Phabricator, Gerrit, other wikis, etc -- reduce data redundancy, easier to view at a glance


    Why not use Wikidata?

    *Wikidata is fantastic

    *But it is a repository for general knowledge, not for random info for a site

    *Editing data, querying data (SPARQL), browse the data via drilldown


    Q/A:

    *Pushback from managers from the security of putting data on a wiki in the first place? Especially review it before publication.

    **[Yaron] I believe that Approved Revs is a good solution to that - shows the most recent vs. most recently approved revision - a nice way to allow editing but then also have some confidence that the data is correct


    *Is it possible to display data across wikis?

    **There is a way with External Data to be able to query data on other wikis if they are using SMW or Cargo.  It is doable, depending on the situation.


    *Do you have any examples of Workflow uses?

    **Workflow is probably the thing that could use the most attention and additonal improvement --- there are extensions that have email notifications if certain fields become certain values to notify specific people.  There are no automated kickoffs of workflows based on changes, however.  


    *In past lives, I've used many of the extensions you've mentioned here. The idea to bring some of these onto the WMF wikis is interesting proposal.  Is the next logical step then to take these to the communities on these wikis?

    **I don't know what the right steps are, or who are the decision makers for any of these wikis - and I don't know what the request process is.  Also, I didn't mention that performance is fine.  Cargo takes about 2/3's the time. If you use the alternate database.


    *How many people are interested in installing these wikis on MW.org?

    *Improve how extensions are searched.

    *There are always things that people could customize and organize all of these extension information.  This could be done NOW.

    *Events on Meta.

    *Wikimania

    *Wishlist

    *Replace something else with this - if you could show that it replaces 15 extensions we already have installed, and has a better use case.


    Mixing data and content

    I think for the Extensions Namespace - it doesn't get reused outside of that wiki - the usecase for wikis for our customers - information on a topic where you want to have the data on the page, examples of graphing in the previous talk in the other room - data off of the commons, vs. very different discussion of this becoming a NoSQL database in and of itself - and having the tagging done automatically by a naive user who fills in the form and saves the page.


    Having associated JSON with the wikipage instead of the infobox.  One of the differences, difference of opinion of where the data lives - human readable first - historically how SMW started, or enforce a strict separation.


    I don't think that question is settled - growing phase, as I get more data, initially my data is stored inline ad-hoc, then maybe structure it with Cargo, then maybe used in more places throw it to Wikidata, I think there is room for a spectrum there.