Wikimedia Developer Summit/2016/ApiUsability
Copied from https://etherpad.wikimedia.org/p/WikiDev16-ApiUsability
- Session name: MediaWiki Action API design discussion: the amazing/good/bad/ugly
- Meeting goal: Anomie has been working on the mediawiki API, let's gather ideas
- Meeting style: Problem-solving(problem discovery?): surveying many possible solutions
- Phabricator task link: https://phabricator.wikimedia.org/T122818
Topics for discussion:
- Use cases
- Bots/tools/gadgets
- historical primary use-case
- need to query content & perform actions
- action API geared towards information about lots of pages
- Google: want to get clean wikipedia data. They've written wikitext parser (parse to structured data). Access templates from API. Access templates; contents are still different from what's visible on HTML page. What the user sees is different from the template. Trying to clean templates to unify implementations. Similar to Wikidata's goal: human and machine-readable data.
- If you access infobox by template vs. html: even the number of infoboxes on the page is different.
- Broader issue: language agnosticism. Action API for specific installation; RESTBase is a "Cassandra-backed persistent cache layer", with modules.
- Bots/tools/gadgets
- Pain points
- What is the best way to query infobox information? ...can there be better ways?
- one problem with infoboxes is that they are written by different people, different inputs and outputs, wikidata is one answer to standardise that
- See also content format discussions https://phabricator.wikimedia.org/T119022
- Discoverability of existing features
- for example it is hard to understand what each API module will give back
- cirrus is another example, people might not be interested in that
- automatically generated documentation: https://en.wikipedia.org/w/api.php
- human-(un)maintained documentation https://www.mediawiki.org/wiki/API:Main_page
- API sandbox https://en.wikipedia.org/wiki/Special:ApiSandbox
- currently undergoing a rewrite by anomie
- modules are hard to categorise and relate to each other (e.g. "if you are doing x on page see also module y")
- Ctrl-F stopped working with the API redesign
- all help in a single page https://en.wikipedia.org/w/api.php?action=help&recursivesubmodules=1 (!!!!)
- The way the XML dumps, the database and the API represent deleted fields is different and poorly documented.
- Related https://phabricator.wikimedia.org/T114019
- Inconsistencies between API access and dumps (e.g. bitfields)
- A lot of the "actions" aren't actually an action. action=query, action=edit makes sense. action=flow doesn't help me flow something "action" has become a top-level categorization
- YES.
- Following on from the point about best practices when writing API modules, this is an important part of the code review process (as well as clear documentation)
- "action" is really which module to ask to
- Too many ways of doing similar but not identical tasks (e.g. fetching current page text)
- part of the problem is fragmentation, often the solution is to ask somebody who has come across the same problem
- Versioning: let's talk about it. Versioning modules. Brad: where possible, add a new parameter instead of versioning. Issues: complexity creep, how to balance?
- Versioning could help substantially with addressing the inconsistencies between data (API/XML/Database/etc). Without versioning, we can't refactor without breaking things.
- What is the best way to query infobox information? ...can there be better ways?
- Design features
- Querying revisions independent of page/user (SELECT * FROM revision WHERE rev_timestamp BETWEEN "2014" and "2015")
- check out the allrevisions module (https://www.mediawiki.org/wiki/API:Allrevisions)
- example of discoverability issues
- Useful: provide a link to the example queries in API Sandbox (in api.php module docs)
- More caching:
- Can caching work for sub-modules of the action API?
- possible, but needs someone willing to work on it. anomie happy to review.
- restbase being single-page-oriented is easier to cache/purge, action api not so much since it operates on many pages
- Can caching work for sub-modules of the action API?
- Mobile views API module should work on more than one article at a time. (depends on the MobileFrontend extension)
- Can we query the API via PHP in mediawiki? Most queries/actions internally directly access the databases.
- not ATM, going back and change that is a huge amount of work to properly separate things
- Would the team be interested in someone working on this with them? Yes! "I'd like to review that code." --anomie
- Can standardize how we access data because there are some nuances in normalization/etc.
- Standardization on this can provide common language
- Unified way of accessing page properties
- [discoverability] Grouping of actions--what goes together? E.g. Cirrus-related could go together so only people who care about it notice it
- possible GCI/hackathon project; make a place for information to go, maybe on mw.org
- Grouping of actions would deal with the action=flow issue (mentioned above). Where that action is essentially a group of everything Flow
- Querying revisions independent of page/user (SELECT * FROM revision WHERE rev_timestamp BETWEEN "2014" and "2015")
General notes
- Is there a long-term plan for the action API? (Currently work is done ad-hoc)
- bd808's notion of code pioneer/settler/city planner for code (http://blog.gardeviance.org/2015/03/on-pioneers-settlers-town-planners-and.html among others)
- Is the purpose to avoid dealing with wikitext? No, not really--you can get HTML out of it, but also handle wikitext.
- API in layers--wikitext, template, other information to allow user parsing?
- quarry (web interface for db queries) records queries, can be a useful learning too for newcomers. replicate the same for api sandbox?
- on the same theme, see also jupyterhub on labs to control pywikibot
Action items with owners:
- Fhocutt: suggest API use-case categorization for hackathon
- !Brad: ask Brad/anomie to review code for API modules, and set aside time to deal with resulting comments. Add anomie as a reviewer on an API-related patch, and if he's not looking at it ping him via email/IRC.
- vague, no one is assigned to it: fix up API documentation. Make a list of pages that need fixing?
Conversations to have:
Attendees:
- Aaron Halfaker
- Filippo Giunchedi
- Darian Fitzpatrick
- Niklas Laxström
- Jordan Adler (Google)
- Bryan Davis
- Zhicheng Zheng (Google)
- Yanan Qian (Google)
- Stas Malyshev
- Frances Hocutt
- Sam Smith
- Joaquin Hernandez
DON’T FORGET: When the meeting is over, copy any relevant notes (especially areas of agreement or disagreement, useful proposals, and action items) into the Phabricator task.
See https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit_2016/Session_checklist for more details.