Wikimedia Technical Conference/2018/Session notes/Making curation and contribution mechanisms equitable and consistent

Theme Defining our products, users and use cases
Type Evaluating Use Cases
Session Leader Derk-Jan, Danny
Facilitator Greg
Scribe Nick, Aaron

Description: A key hurdle to providing an equitable experience to new and emerging communities is unlocking the powerful tools developed on older projects. This session examines the ways we can provide access to these tools “out of the box” for new projects, and remove the gap in toolsets between larger and smaller projects which have fewer users.


Question Significance:

Why is this question important? What is blocked by it remaining unanswered?

What MediaWiki/Extension/Service/Client features are globally applicable across different projects and languages? In order to support equity, we need to understand what features seem to be applicable to a wide array of projects.
What do we need to do to support global curation and anti-spam tools/bots/gadgets… how do we productionize what is on tool labs? Most anti-spam tools and bots are hosted in tool labs and are not language agnostic (en only). Many of these tools are needed in other projects. Many of them could also benefit from running in a production environment and/or being first class features of Mediawiki and running in a production environment. This question should reveal hurdles in making this possible.
How do we make localizable and consistent templates and gadgets globally available across all projects? This is a key question blocking use from support projects that are new and have small communities. Finding a way to create templates that are localizable but shared would enable use to better support these small communities.
What global features are insufficient or nonexistent? We have some features which are “global” (like CentralAuth). Are there any problems with such features that we could learn from when making other features global? Some other features may be unimplemented but are most useful in a global context. These are useful when planning equal access to features.

Questions and answersEdit

Q: What MediaWiki/Extension/Service/Client features are globally applicable across different projects and languages?

• Translations/Internationalisation

•Transparency (logging/history/attribution)







Q: What do we need to do to support global curation and anti-spam tools/bots/gadgets… how do we productionize what is on tool labs?
A (If you were unable to answer the question, why not?):

Language issues.  

Internationalization/localization of strings.  

Some are.  Huggle is.

Thinking about tool labs.  It lives in the United States.  There’s a long latency link from India.  Any boit or tool or thing that you are going to deploy on tool labs, there’s no reason that you shouldn’t host it wherever.  Tool labs offers a certain set of advantages for bots.  But all of those things are public  Change the cutoff where tool labs is abstracted at to make it re-usable.  There are good options locally, so we’re never going to spin up more datacenters.

If we could productize that part of Translations, it would make it easier for tool users -- one recognizable point, and make ingestion of trans easier.  (e.g. we have and a tool that Magnus built)

We're making tools for tools

We can't force people to use it. The point is defining best practices, that might include what to use for tests

If you had PHP test cases in a centralized directory -- how awesome would that be?

Requires 5.4, Toolforge has 5.3

The idea of best practices -- productionizing as a process. At one extreme, you have experimental bot code that only works on one wiki in TL somewhere. At the other hand, there's stuff in infrastructure. Having a graduated scale with specific stages and standards -- experimental, second middle label, by the end of that row, it meets the quality standards, and is taken into the core.

Productionizing.  Tools are "in tool labs somewhere".  Having a graduated scale for getting from "experimental" to "production".  Outline what the stages and standards are.  Could be used as a metric for the quality of a tool.

Foundation needs to say these are the tools we think are important, make an official statement --

But it's up to the community, what tools do they need?

That assumes WMF use as the ultimate end goal of the tool. As opposed to this should operate generically on any MW instance.

We could give it a Toolinfo score, recognize well-developed and maintained tools -- and important for discoverability.

So take WMF out, but keep that quality level on some remote cloud platform, as long as they meet that standard it's good.

Many tool developers are working on this part time.  

On one hand you have stages of standards, the other is this seems important and we want to get it to that stage, and community can't do it themselves, so they're asking for a move to wmf control.

We’re making an assumption that if it moves to WMF it equals equity.  What are other things that are within (e.g. cluebot) tool infrastructure that would make it easier for them to scale to other wikis.  (Orthogonal questions, WMF vs. cross-wiki)

Is there a usage level that is required in order for something to be cross-wiki.  Not necessarily a hard requirements for tools to be cross-wiki.

Eliminate tools as a special environment. Your tool has to be able to be used everywhere.

Ideal if everyone could spin an Amazon instance with MW layout.

Just changing the abstraction level that we're offering. Clean interfaces to the data you need, bundling that can be taken anywhere. A set of standard packages, a toolset you can use elsewhere.

All for exposing this on various platforms and what not, but the idea of paying for my instances.  

We can use WMF $ to buy AWS credits.

If you get rid of the environment, the tools won't go away -- there's a cloud for community contributions -- but the difference here is it's no longer special, distinct from other things. As tool dev, it gives you more responsibility, but the necessary outcome is that tools have to be using publicly available services and APIs, interacting with all wikis.

Thinking about WMF responsibilities, [focusing on infrastructure packages rather than maintained servers] would free the Cloud team from managing that and let them focus on building portable tools.

Agreeing, but need templates -- best practices, if someone starts a new tool it could use templates that have intern tools built in. We can prioritize which tools need to move to no-maintenance.

Metrics is really missing in all the tools right now, would be beneficial for devs and for users.

Coming from a perspective of ignorance -- the discoverability of tools as they exist right now.  How do you find extensions for WordPress?  They have a store.  They list the number of installs.  The idea of having something similar to that.  You can look at a library of tools and see feedback about them.  

If we have no collab aspect for listing and evaluation of tools, we're missing something.

Someone started adding Wikidata items for tools.  That might actually be a good community maintained data storage.  Could include usage statistics.

Having a tool to collect usage statistics for tools -- very meta. Doesn't feel that hard, and would contribute.

If this is all run in a public cloud, you need a framework that does the counting for us.  So the tool will need to self-report (if you want to meet that standard).  How do you do that for javascript? Javascript speaks http. What to do with tools that cheat?  

User-side JS would still report stats. Have to figure out what counts as a usage.

There isn’t a profit motive for being at the top.  

We have a write it in a defensive way.  You ra

There are common mechanisms, the use of a registration type system -- the app has a key, the user decides to instantiate.

Levels of tool quality;


How do you indicate the quality of a tool?

How far the tool is currently useable vs tools that should be used everywhere

As a user, you should be able to find relavant tools for your wiki - might be able to look on Dutch WP, what content tools can I use? query brings back results.

Can that be a mailing list with people who know a lot about the tools?

The tool directory could be extended to do that.

Maybe a q/a forum next to that?

Or a fully automated search engine. This is my access level,

So we'd internationalize the description and tags

Tag it for this is for admins, this is anti-spam

Twinkle does everything -- it's a modular gadget, this is what a module looks like based on some identifiers.

Twinkle as a tool, the software can be run on any wiki, but it needs configuration. So how is that managed? If I'm an admin, what does that mean?

I don't know. Some tools do a cross-wiki resource call. Is that a best practice? It seems to work for me.

Implementation is the issue -- how as an admin do I find out that's what I'm supposed to do?

I want a marketplace for gadgets -- go to my prefs, find gadgets that aren't installed yet, and I can ask an admin to install it.

If I want to do i18n for my templates -- how do people get that info?

Gadgets 3.0!

Config problem is separate from i18n. Post warning message -- on ENwp, that's one edit on one page. On persian, that's 2 edits on 2 pages under certain circumstances. Communities operate differently, so trying to generalize a single tool is hard.

Diffc between translation and configuration.

i think this works towards - tools need a common language of action that can be used for this kind of configuration. given a tool, you want to be able to do a thing, how do you express that?

did brandon harris talk about that? Flow workflows? Was there a proposal for that?

That's part of the context problem. not just configuring where the noticeboard is, but what should you do?

multiple ways to address that. Standardize these workflows, that's an approach that didn't go well. Another is a standard interface that looks the same, but is different behind the scenes. Another is a ticket system, something that diverts away from discussions to proposing a user action. Have pending actions that can be approved or rejected by an admin. It would attribute the action to the person who asked for it.

If every wiki by default had a workflow for "block this user", and it's up to the community to define the steps, but there's also a single entry point. So any tool would know how to start that, just call the API. It could be an automated workflow, or someone with admin privileges has to step in and approve.

This is outside tools & gadgets? It's one of the problems tools run into. One of the barriers is for it to be adaptable to all wikis. Every tool ends up solving that problem again. Or you could avoid that kind of problem by having standardized workflows.

One other problem: the social infrastructure might not be there, a different strategy is needed. For smaller wikis, counter-vandalism is different from any other wiki. That wiki can't handle its own process. Huggle might work in that context? Wasn't designed to be.

so if you're in a small wiki monitoring team, the block request goes to Metawiki, where those admins gather. So there's no local noticeboard? It's on another wiki entirely.

That's something a workflow could handle, the workflow could be - go edit on metawiki. It's an api request. All these interactions are just apis.

One other thing: we talked about making the tool work. but the underlying assets need translation too. Modeling needs to be translated for ORES to work. We ask: what is bad language in your context? How do you train and label in that context? I can't just repurpose an old model. (What happens if you use GTranslate? :) )

Getting language data or label data to build the data science tools/AIs.

There are tools that can't be universally applicable. I don't see that as fundamentally a bad thing. The idea that oRES is not a complete thing, its an almost complete thing, and the model is separate. There isn't one for (pick arbitrary language). From a product perspective, that's fine, it's easy for a user to comprehend.

But the next q: let's say I want one, what does that take? Label these, translate that, let us know, we'll flip the switch. You could imagine that for other tools --

Would it make sense to build that into the tools? I could start a new model from the user interface. It's possible, maybe not a good idea. People sometimes pretend that they don't know swear words or slurs.

If there's an interface that makes a request -- some of it could be put into a staging area, in production.

Levels of tool quality -- notion of templates and recommendations. Structure it like this, this is what we expect for "production-ready". Opt-in standardization, best practices that gets your tool listed.

I18n and localization -- exploded into configuration

Discovery of tools - what do I do to make this work on my wiki?

Global tool usage concern -- not being able to access the tool because of network latency

Q: How do we make localizable and consistent templates and gadgets globally available across all projects?
A (If you were unable to answer the question, why not?):

See Detailed notes

Q: What global features are insufficient or nonexistent?
What is missing?

Change propagation across all projects - You have an article, there's been substantial changes. How do you propagate the projects that use that data?

Shared talk pages across projects - Or a way to more directly connect. On Wikidata there are 50 million talk pages, no one watches all. But with one central GeneWiki talk pages it could work. (a ping/notification could possibly work).

When local projects include from a central project, need an easy way to open a discussion that happens in one place.  Aggregation of talk pages; ability to discuss content across projects. Pings on different projects is a problem.

Side note: Global communication is in general a big problem on MediaWiki.

Central marketplace for gadgets/extensions/software - Is a huge missing thing for MediaWiki. For gadgets it's an ongoing problem for toolhub.

Stats view to be more integrated across wikis - As an admin for a particular wiki, the average # of readers on my wiki doesn't tell me much because it's not in the context of other Wikis.  Need stat info from other Wikis to understand how my numbers compare

What is broken?

The Global user page isn’t easy to find - currently insufficient. Hard to discover.

Central Auth is generally insufficient - Currently not well-integrated; bolted on. Needs better integration (particularly with rights-management)

Recent changes/watchlists needs work -  Better integration of the recent changes is needed If we want collaboration across projects need to fix these.

More general brokenness

CentralBanners is also terrible. Security is an issue.

CentralPermissions, CentralPreferences - not sure if they currently work well.

GlobalBlocks - people generally don't love it. If you try to block someone who already has a local block then your block fails and the existing block can expire (generally nastiness).

Features and goalsEdit

For Use Case and Product Sessions:

Given your discussion of the topic and answering questions for this session, list one or more user stories or user facing features that we should strive to deliver

1. Better support translations and documentation for tools and scripts.

Maybe a translation API service, and centralized auto publishing of documentation

Why should we do this?
  • Increases discoverability
  • Reusability
What is blocking it?
  • Getting standards and templates in place for the community
  • JS/PHP libs to actually use this information.
Who is responsible?
2.1 Provide central storage for gadgets and Lua modules so they can be shared between projects, with support for development workflows/features
Why should we do this?
  • Gadgets have always been one of the ways that volunteers add immense value. It often goes very slow.
  • Allows sharing code between wikis
  • Supports developer workflows that MediaWiki doesn’t (code review, automated tests etc)
  • Insecure gadgets are a major security risk, better support for code review helps.
What is blocking it?
  • Needs community buy-in
  • Needs MediaWiki support
Who is responsible?
2.2 Make existing libraries (esp. OOUI) easier for volunteers to use, provide more central libraries (e.g. a responsive grid)
Why should we do this?
  • Improves sharing code between gadgets
  • More uniform user experience
  • Eat your own dogfood
What is blocking it? Who is responsible?
3. Globalize modules
Why should we do this?
  • Modules have functionality
  • can have translations
  • data and config
  • Allows making them opinionated (per wiki).
  • Easier than Templates
What is blocking it?
  • Cross wiki code reuse
  • lessons learned from the past
  • translations etc.
Who is responsible?

Important decisions to makeEdit

What are the most important decisions that need to be made regarding this topic?
1.Should we start tracking quality and popularity of tools ?
Why is this important?
  • Discoverability -> reuse
  • A radar for WMF
What is it blocking?
  • Helps identity where we can add value or provide more sustainable support for tools
  • What are candidates for production
Who is responsible?
2.1 Decide rules about acceptable VCSes
Why is this important?
  • Value issues (e.g. GitHub not FLOSS)
  • Usability issues (Gerrit not user-friendly)
What is it blocking?
  • Migration of gadgets
Who is responsible?
2.2 Decide security properties (e.g. who needs to be involved in deploying a Javascript change?)
Why is this important?
  • User-contributed Javascript a major potential security risk
What is it blocking?
Who is responsible?
Why is this important? What is it blocking? Who is responsible?

Action itemsEdit

What action items should be taken next for this topic? For any unanswered questions, be sure to include an action item to move the process forward.
  1. Create Opt-in standardization, best practices and/or template that gets your tool listed, scoring higher etc
Why is this important?
  • Best practices encourage consistency
  • Makes it easier to productize tools, makes
  • usabable recognizable, translatable
What is it blocking? Who is responsible?
2. Provide MediaWiki integration for using gadget / Lua module code stored in a version control system (GitHub etc)
Why is this important?

See above.

What is it blocking? Who is responsible?

New QuestionsEdit

What new questions did you uncover while discussing this topic?
  1. CI for tools ?
Why is this important?
  • Makes it easier for toolbuilders to test their code
  • We have the infrastructure
What is it blocking? Who is responsible?
2. Do we want to keep using gadgets or move towards something more integrated (web extensions)?
Why is this important?
  • Both more and less powerful
  • Possily more secure
What is it blocking? Who is responsible?
3. Should tools even need WM cloud services ? Maybe
Why is this important?
  • Frees up resources for WMF teams
  • More independent should be better code ?
What is it blocking? Who is responsible?

Detailed notesEdit

Place detailed ongoing notes here. The secondary note-taker should focus on filling any [?] gaps the primary scribe misses, and writing the highlights into the structured sections below. This allows the topic-leader to check on missing items/answers, and thus steer the discussion.

  • [Insert slides here]
  • Focus on Global aspects. Lua and similar
  • We're not going to be able to solve everything!

Breaking into 3 groups to discuss:

What do we need to do to support gloval tools/bots/gadgetsWhy are many of these tools NOT transferable/usable by multiple communities

Global gadgets and templates.

  • Do we need it at all?
  • How do we improve code reuse?
  • What is the priority?
    • Lack of ability to see changes
  • What can we do this year?
    • Take gadgets and put them somewhere with proper code review system. E.g. git. Make them globally re-usable somehow.
      • Put things in Version Control.
      • Package management
  • NOTES:
    • Some things can be reused with 3rd party wikis
    • Sharing things from enwiki requires so many precursors
    • Lots of value in providing them
    • Something like instantcommons?
    • Lacking support for javascript changes
      • Copying, testing, etc
  • If we provide support to overcome technical hurdles, how do we scale?
  • Some content should be in actual code-version system
    • Gadgets probably
    • Lua modules probably
    • Templates unlikely
  • Current support for Lua?
    • We have CI for javascript, so backend requirements are known
  • Challenge - Integration: How to show things in Recentchanges? - probably not primary, but as a mirror
    • Ideally CI would support some level of debugging
    • Via wiki-page updating, publishing a change on github could push a comment to a wikipage
  • Is it possible to do now?
    • As a prototype, yes
  • French Wikipedia community has already moved their gadgets to github
    • Wider community needs to make a decision about hosting locations -
      • Maybe have Per-gadget decisions on where to host? E.g. [foo] goes on github, and [bar] goes in gerrit
    • Gerrit too complicated for most non-developers and some newcomers
  • WMDE has some experiences and libraries with doing CI on github
  • Hosting our own gitlab? Similar to gerrit, a relevant question is "how much does free cost?"
  • Con to gerrit: existing backlogs and unreviewed patches
  • Access controls - if the code resides on github, we need different permision systems for the threat of account takeover.
  • Is this something we should focus on? Every major decision that leads to more responsibility for us, requires more ongoing resources.
  • Need even more details on costs/benefits…
  • Templatewiki as first priority? More complex caching challenges - for gadgets updates would be done manually, existing cache invalidation system can deal with that.
  • Need to understand all the social challenges and options
  • How can we re-use code more efficiently - more libraries?
    • Would we also enforce code-reuse?
    • How to make libraries discoverable? Lack of advertising is part of why existing libraries aren't well used.
    • Provide more docs for OOUI
  • Community dev wishlist - wanted better dialogue-widgets.
  • Related: Frameworks and evolution. React, Fountain, etc etc etc.
  • Incentivizing by providing new libraries at hosted locations
  • Documentation could be automated
  • Advertising could be automated
  • Toolhub -
  • Something like npm tags for [?]
  • Point to github raw?
  • Documentation standards?
  • Package management? Npm publish? Yarn?
  • Access controls? Smaller problem.
  • Are we recommending the Community does this? Or offering to help? Or leading by example? - Lead by example, provide standard pathway,
  • Gadgets have always been one of the ways that volunteers add immense value. It often goes very slow.
  • We could get away from changing core with gadgets, and move to more standalone 'apps' using web extensions
  • [Created a dedicated toolkit for a framework in which gadget authors could work?]
  • E.g. tools on toolforge could benefit from a more consistent UI. OOUI is currently quite complicated. We want a bootstrap 3.0 with Wikimedia theming!
  • Onwiki gadgets benefit from avoiding OAuth complexities.
  • Contributors want configurable display options
  • Why do we need individual installation
  • Gadgetwiki ?
  • Templates: Only create global modules, and not global templates? Templates can often be simply localized Wikitext shells for a module.
    • Much easier to provide custom
  • If we say JS gadgets become repos, and aren't synced to a wiki, would we do the same with modules? We do have a lot of users who understand a bit of javascript. We don't necessarily want to push them towards git.
  • M
  • Packamanget management one click, confidence of team supporting
  • Is priority template or gadgets?  Templates are more important but harder. JS has more security implications.

Global features

  • Central auth
  • OAuth
  • Translatewiki
  • Recentchanges/watchlist/eventstream
  • Global moderation
  • Global permissions


What is holding us back?

Priorities (i.e. money)