API/Architecture work/Planning

Deprecation process

Discussion at the Architecture Summit in January 2014 was generally favorable to deprecations of major features, as long as we give people enough time to update. Minor changes will continue to be announced to the mediawiki-api-announce mailing list.

When it is possible for the new version of the feature to coexist with the old (e.g. prop=imageinfo and prop=fileinfo):

The new feature will be implemented.
The deprecation will be announced:
- A message will be sent to the mediawiki-api-announce mailing list.
- The old feature will report deprecation warnings.
- Uses of the deprecated feature will be logged on the server (currently in WMF's case this is on fluorine where deployers have access).
After a suitable timeframe (e.g. if the deprecation was in MediaWiki 1.24, during the 1.25 development cycle), usage of the deprecated feature on WMF wikis will be evaluated and the deprecated feature may be removed.

When it is not possible for the new version to coexist with the old (e.g. changing format=json):

The new feature will be implemented, but must be explicitly requested by clients via a query parameter.
The deprecation will be announced:
- A message will be sent to the mediawiki-api-announce mailing list.
- Deprecation warnings will be output when the parameter to request the new version is not given.
- Uses of the deprecated feature will be logged privately.
After a suitable timeframe, the new version will become the default and the old removed. The "request the new version" parameter will be silently ignored.
The "request the new version" parameter will at some point be removed, leading to "unrecognized parameter" warnings.

When the default for a behavior is to be changed but the old behavior is not being removed (e.g. changing the default continuation to be the new easy-to-use style rather than the current query-continue):

If not already present, a request parameter will be added to specifically request the old behavior.
The change will be announced:
- A message will be sent to the mediawiki-api-announce mailing list.
- Deprecation warnings will be output when neither the select-new-version nor the select-old-version flags are used. Logs will also be made.
After a suitable timeframe, the new version will become the default.
Any flag to select the new version explicitly may at some point be removed, leading to "unrecognized parameter" warnings.

Gerrit changes:

Adding logging of deprecated feature hits: gerrit:154095, gerrit:154096

Comments (Deprecation process)

Items being considered

These items are being considered for implementation. Any new additions should probably go here.

Changes to PHP output format

The changes described above for the JSON output format will also be applied to the PHP output format, where applicable.

Comments (Changes to PHP output format)

From a code perspective, are we just going to have one class that handles preparing an array for formatting, and then the subclasses just do something like return FormatJson::encode( $stuff ) or return serialize( $stuff )? Legoktm (talk) 22:36, 16 July 2014 (UTC)[reply]
- That's already basically how it works, I'm not planning on changing it. Anomie (talk) 20:00, 31 July 2014 (UTC)[reply]
For the same reason the JSON changes changed to format=json2, this'll probably not happen. Or else be format=php2. Anomie (talk) 20:14, 9 August 2014 (UTC)[reply]

Items for implementation

These items are planned for implementation as time permits. Note this list is not in any particular order.

Changes to JSON output format

Tracked in Phabricator
Task T76728

The existing JSON format suffers from a number of shortcomings that make it more difficult to use than necessary. Many of these are inherited from the underlying data structure being designed for the XML format. Thus, ~~format=json2~~ formatversion=2 will be created with the following differences and the existing format=json will be deprecated and eventually removed:

The existing 'utf8' option will be the default.
- A new 'ascii' request parameter will be introduced for clients who need all non-ASCII codepoints escaped.
Anything using '*' as a key will be renamed to something more natural. In some cases this may result in something like "query.page[1].foo['*']" becoming simply "query.page[1].foo", and in others something like "query.page[1].foo.content".
Boolean result properties will use boolean true as the value, rather than the empty string. Whether a property will be present with a boolean false value or will continue to be entirely absent from the result when false will be determined on a case-by-case basis.
- Result parameters that are already being returned as booleans may accidentally change to the empty-string style in the format=json output.
Page lists will be returned as arrays rather than objects with page_ids as keys. This will make it easier for clients to iterate over the results.
- The 'indexpageids' parameter will be removed.
The JSON formatter currently has a tendency to return values that are normally objects as arrays when empty (bug 10887). This will be easily fixable.

On the MediaWiki code side, developers will see the following changes:

If anything is currently returning boolean values as actual booleans rather than the API standard empty-string, code will need to be changed to preserve this behavior in the non-sane output. The exact code change is yet to be decided, but will be something along the lines of having to pass such boolean values through some method on ApiResult.
There will be a way to explicitly tag a PHP array in the result as "array" or "object", much like how ApiResult::setIndexedTagName is used for the XML format.
- Ambiguous cases, such as empty arrays or some kinds of arrays with integer keys, might throw an error if not explicitly tagged. If this would bother you, comment!

Comments (Changes to JSON output format)

These comments refer to a previous version of this proposal

This would seem to make the assumption that everyone can and will update their wikis and clients to the latest version within a reasonable time frame. I can still find 1.15 wikis out there, and even 1.18 or 1.19 aren't that uncommon. How do bots and such handle the varying format when they may have to deal with wikis that are on different versions? Much like the newer continue style, I think there always needs to be a parameter to indicate that the requester supports the new format. Even if you remove the old format entirely after the transition period, a bot should get an error that indicates that the old format is no longer supported, rather than having an unpredictable data format thrown at it. Rather than using "sane=1" on a temporary basis, would it not make more sense to use something like "format=json2" on a permanent basis to indicate which output format is actually in use? – RobinHood70 ^talk 16:14, 17 July 2014 (UTC)[reply]
The thing I don't like about that suggestion is that it requires up-to-date clients on up-to-date wikis to be specifying useless parameters forever. Anomie (talk) 10:41, 19 July 2014 (UTC)[reply]
You already have to specify the format parameter anyway. What's the difference between specifying "format=json" and "format=json2"? It allows transitional wikis/clients to support both while past and future wikis can fail or fallback gracefully. – RobinHood70 ^talk 15:12, 19 July 2014 (UTC)[reply]

Another benefit is that it makes the coding easier both during and after transition. Instead of having a bunch of "if sane" checks that later have to be removed, you're using a whole new ApiFormatJson2.php module that doesn't need to be checked for backwards compatibility at all because clients only get it if they ask for it. – RobinHood70 ^talk 18:06, 19 July 2014 (UTC)[reply]
Looking at the code, I suspect something like what I suggested wouldn't work, but I'm only passingly familiar with the MW code, so I'll leave that to you. My concern here is that a bot that works across multiple versions should know how to behave without having to do any guesswork. While the wiki version number should never be changed, it conceivably could be. Even a well-intentioned change, like "$wgVersion = MyWiki running on MW 1.23" would throw many version parsers for a loop. To that end, what about adding two new outputs to the siteinfo data: apiversion (which I would see being a plain integer to indicate major changes only) and apisane? A bot would then easily know what output format to expect and whether a sane check is required to get the new functionality. This assumes, of course, that API version and JSON changes go hand-in-hand. If we assume those might change independently, then other outputs could be added to indicate JSON version and sane requirements. – RobinHood70 ^talk 15:32, 23 July 2014 (UTC)[reply]
Just a thought on the topic of boolean values: while I agree that case-by-case is the way to go, as a general guideline, I'd like to suggest that false boolean values only be emitted if false isn't the default state. – RobinHood70 ^talk 14:12, 20 July 2014 (UTC)[reply]
+1 to all of these. Overdue imo. I'll gladly rewrite some of my client library code for these sanity features. -FASTILY 09:53, 31 July 2014 (UTC)[reply]
Possible alternative: "format=json2" instead of a temporary "sane=1". Pro: Avoids someone's 10-year-old script breaking when they try to run it 10 years from now with format=json. Con: BC oddities forever. Anomie (talk) 16:56, 6 August 2014 (UTC)[reply]
I'm not necessarily saying you should keep original json forever, but at least during the transition period, either can be requested depending what the requester supports and once it's removed, requesting format=json will simply break rather than giving older clients results in an unexpected format, which could conceivably cause improper decision-making leading to bad edits. – RobinHood70 ^talk 03:20, 7 August 2014 (UTC)[reply]
That's the plan; I didn't have time to write a long description during the discussion at Wikimania. Anomie (talk) 07:00, 7 August 2014 (UTC)[reply]

Note this proposal has been modified slightly: due to the fact that the changes here would subtly break clients that weren't updated during the whole transition period, it seems better to make them break cleanly by having "format=json" simply fail after the transition period. I'm not terribly fond of the name "json2", but I can't think of anything better. Anomie (talk) 11:27, 7 August 2014 (UTC)[reply]

As a suggestion - how about "newjson", "betterjson", or maybe even "sanejson"? :P -FASTILY 20:25, 9 August 2014 (UTC)[reply]

The problem with those is: what happens when the next version of JSON rolls around? The naming starts to get silly when you've got "newerjson", "newestjson", "evennewerjson", "latestandgreatestjson", etc. :) – RobinHood70 ^talk 23:58, 9 August 2014 (UTC)[reply]

Changes to XML output format

Changes here will mostly be on the back-end; the actual data output to clients is intended to remain the same wherever possible. However, clients should be prepared for the following:

Result structure may no longer match the JSON format.
Tag and attribute names may be encoded when not conforming to XML requirements.
Result structure may change depending on the specific query. For example, passing both rvprop=content and rvdiffto=prev to prop=revisions will currently omit the diff from the result (bug 55371) (it should be throwing an error, but that's another bug). In the future, it's likely that this will return the content as the value of the <rev> node when rvdiffto is not supplied and as the value of a <content> subnode of the <rev> node when it is.

For example, bug 43221 was fixed by changing the names of attributes such as "4::foo" to fit XML's restrictions. In the future, this would be fixed by either encoding the name (e.g. "_4.3A..3A.foo") or by changing the structure of output in only the XML format (e.g. <attribute name="4::foo">).

On the MediaWiki code side, developers will see the following changes:

The XML formatter will no longer die if ApiResult::setIndexedTagName() is forgotten. Instead, it will act as if that were called with something generic (e.g. ApiResult::setIndexedTagName( $array, 'item' )).
The XML formatter will no longer (be supposed to) raise an error when a node has both node content (ApiResult::setContent) and non-scalar attributes. Instead, it will simply shove the intended node content into a subnode.
Anything that's hard-coding '*' instead of using ApiResult::setContent is going to break.
There may be additional ApiResult calls required in some cases.

Comments (Changes to XML output format)

Internationalizing API warnings and errors

API warnings and errors are currently returned in English (bug 35074), and further multiple warnings are concatenated into a single text string.

The error codes will generally not change, this will only control the human-readable messages.

The plan is for an error-language option with the following possibilities:

'none' returns the message key and parameters, no human-readable message
'user' uses the language in $wgLang to generate a human-readable message
A language code uses the specified language

The non-'none' options would have an additional option to specify whether the message should be returned as HTML ($msg->parse()), wikitext ($msg->text()), or wikitext ignoring site-local customizations ($msg->useDatabase( false )->text()).

Errors and warnings will both be returned as arrays of objects, each object having a code, the source module (maybe not for errors?), and message data as above.

During the transition period, omitting the error-language option will produce backwards-compatible output. After, 'user' will likely be the default.

On the code side, this will entail a major reworking of the various error and warning methods in ApiBase.

Comments (Internationalizing API warnings and errors)

Is there a reason we can't just provide both html and wikitext error messages, and let the user pick whichever one they want? Legoktm (talk) 23:12, 16 July 2014 (UTC)[reply]
I'd rather not clutter the response with useless repetition of every message in three different formats, when the client already knows which one it wants when making the request. Anomie (talk) 10:59, 19 July 2014 (UTC)[reply]
Makes sense. +1 Legoktm (talk) 19:38, 30 July 2014 (UTC)[reply]
Wikibase has already started adding i18n support for some of its api error messages, I am sure the team would appreciate some sort of response in core :) Addshore (talk) 07:55, 1 August 2014 (UTC)[reply]

Query item count

People sometimes request a count(*) functionality for various modules, and even though there is plenty of justification to get it, the fundamental database limitation has always stopped us - counting all items is an O(N) table traversal. As a result, the clients could only do a full client-side iteration of all the data and count it locally. This wastes both the server resources and bandwidth.

It would be relatively simple^{[citation needed]} to allow modules to return an integer from 0 to the relevant limit. For example, if foolimit=100 then the result in "count" mode would be a number 0 to 100 or "101+".

In code, for performance this would need some run-mode passed into the module so the module can know not to bother producing results beyond a total count. Consideration is also needed for what happens if someone does something like "action=query&count=backlinks&generator=backlinks&list=backlinks"; likely parameters will need to be prefixed as is done with generators.

Comments (Query item count)

Support -- great idea! ☠MarkAHershberger☢(talk)☣ 18:15, 21 October 2013 (UTC)[reply]
Support Definitely will be useful. Cyberpower678 (talk) 19:33, 29 July 2014 (UTC)[reply]
+1 -FASTILY 09:53, 31 July 2014 (UTC)[reply]
Support. Also MySQL has some support for doing only index scans when count query has where clause --Ilya (talk) 11:18, 24 February 2016 (UTC)[reply]

Rewrite prop=imageinfo from scratch as prop=fileinfo

The code is a mess, the limit semantics make no sense, and we have several other options that don't really fit non-images.

The best thing to do here is probably to just write a prop=fileinfo module from scratch so we don't have to worry about backwards compatibility, and then deprecate prop=imageinfo.

Current plans:

Going to ask Flow to re-prefix their prop=flowinfo module, so fileinfo can have "fi".
Right now, iilimit specifies the max number of revisions to return per file, which is inconsistent with the rest of the API and isn't particularly sane. For fileinfo, filimits will limit the number of file-info-objects returned per result, and a separate "fioldversions" property (default 0, values integers or 'all') will specify the max number of revisions to be returned per file.
fistart/fiend may result in the info for the current revision not being returned.
iiprops has three different metadata properties. There really should be only one, and if possible it should be key-value pairs rather than a list of objects with key and value properties.
There will be no equivalent to iiurlwidth or iiurlheight. Instead there will only be fiparams which will be roughly equivalent to iiurlparam (but multi-valued).
prop=stashimageinfo is very odd, it's a prop module but doesn't use any titles. It would make sense to me for prop=fileinfo to have a fifilekeys parameter instead of having a whole separate module for this.
prop=videoinfo really isn't needed either. Instead we should make it possible for extensions to add additional info to the fileinfo response.

Comments (Rewrite prop=imageinfo from scratch as prop=fileinfo)

Having just implemented the client side of this for my bot, you have my absolute support! If there's anything you can do to convert the iiprop=metadata|commonmetadata|extmetadata all into something a bit more consistent, that would be ideal. – RobinHood70 ^talk 01:49, 30 July 2014 (UTC)[reply]
+1 Yes please. -FASTILY 09:53, 31 July 2014 (UTC)[reply]
Support --Ricordi samoa 08:28, 24 October 2014 (UTC)[reply]

Clean up log event parameter handling in action=logevents

The current method is a big switch in ApiQueryLogEvents that specially formats certain log event types. This is ugly, and won't work at all for extensions. IMO, this logic should go in the LogFormatter subclasses, as after all it's a matter of formatting.

On the client side, we should probably regularize all the parameters under a "params" node, instead of having some dumped into the main object (possibly conflicting with other properties!) and some in a subarray named for the type. This'll allow us to clean up some of the legacy param naming at the same time.

We should also probably explicitly note that BC breaks may occur for log events currently using the "legacy" format, and maybe a parameter to explicitly request the legacy format for all events from action=logevents.

Comments (Clean up log event parameter handling in action=logevents)

See bugzilla:33235, I'm not the first to think of this. Anomie (talk) 13:54, 15 September 2014 (UTC)[reply]
Also bugzilla:71020, that's what we get for just dumping the details in the main object. Anomie (talk) 21:01, 18 September 2014 (UTC)[reply]
One small item while you're cleaning this up anyway: I just noticed that the logevent for action "merge/merge" spits out a MediaWiki-formatted timestamp rather than an ISO 8601 format. – RobinHood70 ^talk 02:37, 30 September 2014 (UTC)[reply]

Items in progress

These items have changes submitted for code review, or are waiting for analysis as to whether the feature usage has dropped far enough that they can be formally removed.

Remove obsolete output formats

The following output formats will be deprecated and removed:

wddx / wddxfm
yaml / yamlfm - it's identical to json anyway
txt / txtfm
dbg / dbgfm
dump / dumpfm

The following output formats will remain: json / jsonfm, xml / xmlfm, php / phpfm, rawfm, none.

JSON will be the preferred output format.

Gerrit changes:

gerrit:154098 - Mark formats as deprecated

Comments (Remove obsolete output formats)

+1 Legoktm (talk) 22:29, 16 July 2014 (UTC)[reply]
+1 Addshore (talk) 10:59, 20 July 2014 (UTC)[reply]
-1 because txt/txtfm is very useful for me to the readable PHP format. If txt/txtfm is kept, I will +1.Cyberpower678 (talk) 18:59, 29 July 2014 (UTC)[reply]
-1 I remember submitting a patch to add txt format, as debugging the output from the serialized PHP format was extremely irritating and time consuming. I can see getting rid of dbg and dump, as they're mostly extensions of that, though. As Cyberpower said, consider this a +1 if txt is kept. Soxred93 (talk) 01:09, 30 July 2014 (UTC)[reply]
- @Cyberpower678 and X!: What advantage does txt/txtfm have over jsonfm, besides that you personally might be more familiar with PHP's print_r output than JSON? Anomie (talk) 11:37, 30 July 2014 (UTC)[reply]
- Indeed, Jsonfm is extremely easy to debug with :) Addshore (talk) 07:46, 1 August 2014 (UTC)[reply]
  - I end up misreading JSON, because of the quotes and semicolons bunched together like that. It's an issue I have in general when looking for something in a piece of text. TXT far easier for me to read, and search through than JSON, because the array keys are surrounded by brackets which stand out more as well as the value of an index gets pointed to with "=>" which also stands out more. It's more of a readability issue, rather than not understanding how to read it. Also I don't see why we need to remove it. It's not like you have a sloppy custom TXT generator. It simply PHP's print_r. But if you guys are still going to remove it, I suppose I could get used to it, but I would really prefer to use PHP's print_r because it's easier for me to read.Cyberpower678 (talk) 11:44, 2 August 2014 (UTC)[reply]
    - jsonfm seems as legible as txtfm. I use http://jsonlint.com/ to reformat "smashed":{"together":["JSON",output]}. -- SPage (WMF) (talk) 21:23, 1 October 2014 (UTC)[reply]
+1 -FASTILY 09:53, 31 July 2014 (UTC)[reply]
+0 An advantage of txt/txtfm (and xml/xmlfm) is its human readability. Also, most browsers will attempt to display xml. Sidenote: in php 5.3, at least, json has a slight performance advantage over php's own serialization. - Amgine (talk) 01:27, 2 August 2014 (UTC)[reply]
JSON with appropriate whitespace is as human-readable, IMO. And all the "fm" formats are served as an HTML page, so browser display isn't much of an issue. Anomie (talk) 16:07, 6 August 2014 (UTC)[reply]
Support --Ricordi samoa 08:29, 24 October 2014 (UTC)[reply]

Removal of long-deprecated parameters

Analysis will be done to determine whether anyone is still using the following:

"watch" and "unwatch" parameters that have been replaced with "watchlist".
"sessionkey" parameter to action=upload and prop=stashimageinfo that was replaced with "filekey".
"toponly" parameter to list=usercontributions.
"querymodules" parameter to action=help, replaced with extended syntax for the "modules" parameter.
- action=paraminfo will get the same treatment.
"title" parameter to action=watch.
"url" parameter to prop=langlinks and prop=iwlinks

Additional deprecated parameters may also be considered.

Gerrit changes:

Logging added in gerrit:154101
prop=iwlinks iwurl deprecated in gerrit:155259

Comments (Removal of long-deprecated parameters)

+1 seems reasonable -FASTILY 09:53, 31 July 2014 (UTC)[reply]
-1 for llurl of prop=langlinks. This should be always implemented consistent with prop=iwlinks which currently only uses iwurl=1 and still has no iwprop=url replacement. The input/output structure is currently the same which should be kept until also iwlinks implements to new prop=url feature. Merlissimo (talk) 11:44, 20 August 2014 (UTC)[reply]
I note that llurl is already deprecated, and has been since February 2014. But I have no problem with deprecating prop=iwlinks&iwurl too. In both cases the result format is not changing. Anomie (talk) 13:31, 20 August 2014 (UTC)[reply]

Simplified continuation as default for action=query

Currently, this must be requested by passing an empty 'continue' parameter in the initial request. This will be changed to be the default, and the raw query-continue may be requested with a 'rawcontinue' parameter.

Gerrit changes:

Adding "rawcontinue": gerrit:154092
Deprecation warning: gerrit:160222, planned for merge during 1.25
Changing the default: gerrit:160223, planned for merge during 1.26

Comments (Simplified continuation as default for action=query)

Wasn't query-continue supposed to be phased out anyhow? -FASTILY 09:53, 31 July 2014 (UTC)[reply]
- Yuri wanted to, but I as an API user find the new method lacking in flexibility. So I'd prefer to keep it as an advanced option. Anomie (talk) 14:17, 31 July 2014 (UTC)[reply]

Deprecated API usage report on WMF wikis

At Wikimania 2014, it was suggested that developers of API-using tools would find it useful to be able to access a report detailing the hits from a user agent to deprecated API features, as the existing deprecation warnings delivered to the client may not be seen by the developer of the tool.

Input to this tool would be a User-Agent and a date range. Output would be a list deprecated API features hit by that agent (see gerrit:154095 and gerrit:154096) with a count per day, perhaps something like the following:

Feature	Date	Hits
format=wddx	2014-10-02	5
format=wddx	2014-10-01	3
action=upload&watch	2014-10-02	12
action=upload&watch	2014-10-01	17

Gerrit changes:

gerrit:174200 - Agents for in-browser JS
gerrit:174787 - Extension
gerrit:173336 - Logstash and other configuration changes

Comments (Deprecated API usage report on WMF wikis)

The idea was run by User:LuisV (WMF), who replied in part "If API features can never be tied to specific pieces of content, then this looks unproblematic." Anomie (talk) 15:01, 2 October 2014 (UTC)[reply]

Allow generators to provide data

It's a fairly common request for generators to be allowed to provide data, for example bug 14859 requesting that the ordering from list=search be somehow preserved when using generator=search. Let's do it more generically.

Notes:

It's up to the generator to make sure data for generated pages doesn't conflict with data from prop modules.
Generated data is not automatically copied to redirect targets when redirects=1 is used.
Generators should avoid going overboard with generated data, and should generally not include it unless requested (despite the only current implementations, list=search and list=prefixsearch, being exceptions to that rule).

Gerrit change: gerrit:175759

Comments (Allow generators to provide data)

Items not planned for implementation

These items are currently not planned for implementation, either because they don't seem desirable or because the necessary effort and/or disruption to users does not seem worth the benefit.

Change defaults for "prop" parameters

Many query modules take a 'prop' parameter to specify which bits of information the client actually wants. Defaults for these parameters may be cut back or eliminated entirely. Or the prop parameter may be made required with no default.

Comments (Change defaults for "prop" parameters)

I'm not sure if this one is really worth the trouble. Anomie (talk) 19:13, 16 July 2014 (UTC)[reply]
This strikes me as a case of breaking things for the sake of breaking them. I can't see any problem with leaving it the way it is. --Carnildo (talk) 01:44, 30 July 2014 (UTC)[reply]
I suspect most bot users set props equal to something, and people just use the defaults when they're playing around with the various API modules. Either cutting back the default props or making props required would probably be worse than what we have now. Leucosticte (talk) 02:51, 25 September 2014 (UTC)[reply]

Allow paging the "titles" parameter

If too many titles/pageids/revids are given to the query module (or generator), it should page through them rather than erroring out or issuing a warning and ignoring some. This way client does not need to worry about passing too many titles; the query will simply treat it just like a generator, returning an appropriate continuation value.

Comments (Allow paging the "titles" parameter)

Do we really want the client to be passing us 10000 titles just for us to tell them to retry 9950 of them? The client can as easily handle that on the client side, and save bandwidth in the process. Anomie (talk) 19:13, 16 July 2014 (UTC)[reply]
Anomie has a point here. I don't see a good way of submitting large numbers of titles and having them continue without having to resubmit those titles with each request, which is a waste of bandwidth. That said, it would be really nice at the client side to not have to worry about splitting page collections into smaller groupings or submitting requests every nth page or whatever approach you want to take. If a way can be found to submit a title list once and then page through it, that'd be great. Otherwise, yeah, I agree with dropping this idea. – RobinHood70 ^talk 16:45, 17 July 2014 (UTC)[reply]
Despite the API not being anywhere near "level 3 REST", I'd like to preserve the REST principle of avoiding server-stored request state (i.e. a remembered list of titles to be processed). Anomie (talk) 11:07, 19 July 2014 (UTC)[reply]
Agreed. Something that preserves state could leave the doors wide open to a DoS attack that would bring servers to their knees, so not a good idea. – RobinHood70 ^talk 17:34, 19 July 2014 (UTC)[reply]

Extension:SiteMatrix should create a query submodule

The action added by Extension:SiteMatrix, action=sitematrix, should really be a query submodule meta=sitematrix. In addition, it's output structure could be improved.

Further, this action seems to serve much the same purpose as meta=siteinfo&siprop=interwikimap. They could be merged somehow.

Comments (Extension:SiteMatrix should create a query submodule)

Actually replacing meta=siteinfo&siprop=interwikimap isn't really feasible unless we can make the output entirely compatible. And doing so would be facilitated by the following proposal. Anomie (talk) 04:31, 12 September 2013 (UTC)[reply]

`meta=siteinfo` should be split up

Many of the options available to meta=siteinfo's siprop should be split into their own meta submodules. This would be an interface cleanliness issue.

Comments (`meta=siteinfo` should be split up)

Support -- as long as there is some sort of versioning or backwards compatibility. -- ☠MarkAHershberger☢(talk)☣ 18:15, 21 October 2013 (UTC)[reply]
+1 -FASTILY 09:53, 31 July 2014 (UTC)[reply]
+1 Addshore (talk) 07:58, 1 August 2014 (UTC)[reply]

Module prefix limiting

Core modules should use two-letter prefixes and extension modules should use three-letter prefixes (with 'g' prohibited as the first character). The intent here is to avoid collisions between extensions and new core modules.

Comments (Module prefix limiting)

Seems unduly limiting; some core modules already use longer prefixes, and it does nothing to prevent collisions between extensions. Anomie (talk) 04:31, 12 September 2013 (UTC)

Embed the action in the URL

To facilitate directing particular actions to different API processing clusters, it would be advantageous to include the action in the URL even for POST requests. Embedding it in the PATH_INFO may make it easier to do this,^{[citation needed]} but may not be possible on all hosts. As an alternative, the API could simply require that action be present in $_GET rather than $_POST.

Comments (Embed the action in the URL)

Completed items

Items listed here have been completed.

Removal of certain data from action=paraminfo

The data returned by action=paraminfo includes two items that appear to be at best incomplete and seem to have almost no possible uses:

'props' is supposed to contain some sort of data structure indicating which result properties correspond to which request parameters. But the format of this data isn't even specified and the existing examples seem to be ad-hoc without any real consistency.
- The intended use of this data appears to be for automatically generating objects with property accessors to wrap access to the MediaWiki API. But given the lack of any specification as to the data structure, I expect this has at most one actual user.
'errors' is supposed to contain a list of possible errors that the API module can return. But the lists are incomplete, and in some cases cannot ever be complete since additional errors can be raised by extension hooks in code far removed from anything related to the API.
- I imagine the intended use of this data is again for automatically generating strongly-typed errors in some library trying to wrap the MediaWiki API. But since the data is complete, any such library is already going to have to have a generic fallback. It would probably be best for it to use that fallback in all cases.

Gerrit change: gerrit:152760

Comments (Removal of certain data from action=paraminfo)

As an API user, seeing a list of possible errors that might occur is nice, so I can think about what might go wrong in a program flow. As a developer, if something as a hook, it's completely impossible to document what might happen. I think something like "This is a list of more common errors that might occur, but not a complete list" would be nice. Result properties are useless. Legoktm (talk) 23:07, 16 July 2014 (UTC)[reply]
OTOH, as an API user myself I don't much see the point to an incomplete list of errors unless there's something the program can do about the error automatically besides logging it for human attention and/or moving on to the next thing. At which point it's probably better done in the human-curated documentation (improvement of which is also planned for next quarter). But I agree that explicitly marking it as an incomplete list would be better than what we have now, which probably encourages some people to try to handle every one individually. Anomie (talk) 10:53, 19 July 2014 (UTC)[reply]
As a developer of mw apis for extensions the manually maintained list of possible errors is ugly and annoying to maintain. I know for a fact that over the past year many of the error lists returned by some of the modules would likely have been wrong. Addshore (talk) 07:53, 1 August 2014 (UTC)[reply]

Token handling

API modules that perform changes must use tokens for CSRF protection. Currently there are multiple ways to retrieve a token: action=tokens, action=query&prop=info&intoken=..., action=query&prop=revisions&rvtoken=..., action=query&list=users&ustoken=..., action=query&list=recentchanges&rctoken=.... Formerly some modules would implement their own "gettoken" parameter, although now only action=login does anything like this. Further, some modules have their own "type" of token and others use the generic "edit" token type, and which is required for a particular module is not always clear. And it's not possible to fetch both the token and the data of the page to be acted on at the same time.

The following changes will be made to token handling:

All existing methods of retrieving tokens will be deprecated.
A new meta=tokens will be added to action=query. It will work just like action=tokens does, but by virtue of being a submodule of action=query you can combine it with e.g. prop=revisions to fetch both the edit token and the page content being edited.
The help for every 'token' parameter will clearly indicate which token type is needed. The type will also be included in action=paraminfo.
Many of the existing token types will be merged into a single 'csrf' type, as they're already all the same token.
All tokens will be static, not varying based on the target of the action.
- All tokens in core and WMF-deployed extensions are already static except for action=rollback, which depends on the title and user being rolled back, and action=userrights, which depends on the user. These actions will accept both a new static token and the non-static token used in the web UI. The web UI will continue to accept only the existing tokens.
The old token-fetching methods are still present but will return deprecation warnings.

On the code side, the token-related methods in ApiBase will be changing. For most extensions, it's just a matter of changing needsToken() from returning true to returning 'csrf'; extensions using custom salts will either need to add those salts (using a new hook) or convert to 'csrf'. Provision is made for extensions maintaining BC with earlier versions of MediaWiki.

Gerrit change: gerrit:153110 (plus 153085–153109 to update various extensions)

Comments (Token handling)

Yes please. Also just noting that action=createaccount has it's own token handling logic similar to action=login. Legoktm (talk) 22:28, 16 July 2014 (UTC)[reply]
Both login and account creation will continue to need a special token to avoid login csrf. Otherwise I think this is all good. — Preceding unsigned comment added by CSteipp (talk • contribs) 23:49, 16 July 2014‎
Love it! Addshore (talk) 11:03, 20 July 2014 (UTC)[reply]
What will happen with the starttimestamp currently emitted by prop=info&intoken=...? Since this needs to be updated per edit to check for page deletion since the edit started, I'd recommend just moving it to the general section of the info. – RobinHood70 ^talk 02:10, 27 July 2014 (UTC)[reply]
- Either that or a "meta=timestamp", most likely. While the needed timestamp is available from meta=siteinfo, that's a lot of extra junk to query just to get the timestamp. Anomie (talk) 13:41, 28 July 2014 (UTC)[reply]
YES +OVER 9000!!! Cyberpower678 (talk) 19:29, 29 July 2014 (UTC)[reply]
Yep. Tokens was easily one of the worst parts when designing Peachy. Soxred93 (talk) 01:09, 30 July 2014 (UTC)[reply]
I can write a new token function into Peachy 2 @X!: .Cyberpower678 (talk) 11:48, 2 August 2014 (UTC)[reply]
+1 -FASTILY 09:53, 31 July 2014 (UTC)[reply]

JSON output as default

The API currently defaults to xmlfm when no format parameter is given. This will be changed to jsonfm.

Note this will not affect modules that use their own custom output formatters. Also, action=help will be getting its own custom output formatter (see below).

As no client should be trying to parse the *fm formats, this probably won't follow a deprecation process. It'll just be done once action=help is rewritten.

Gerrit change: gerrit:160819

Comments (JSON output as default)

No brainer Legoktm (talk) 22:34, 16 July 2014 (UTC)[reply]
Strong support on this one Ladsgroup (talk) 08:21, 17 July 2014 (UTC)[reply]
+1 Addshore (talk) 10:59, 20 July 2014 (UTC)[reply]
+1 Protonk (talk) 22:04, 22 July 2014 (UTC)[reply]
+1 Soxred93 (talk) 01:09, 30 July 2014 (UTC)[reply]
+1 NicoV (talk) 11:51, 30 July 2014 (UTC)[reply]
+1 -FASTILY 09:53, 31 July 2014 (UTC)[reply]
Get ru.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Категория:Википедия and read ""title": "\u0412\u0438\u043a\u0438\u043f\u0435\u0434\u0438\u044f"" (title="Википедия" if use "format=xmlfm"). Awesome! So simple to read! Is there s life beyond EnWiki? Nobody cares. 95.37.16.54 00:45, 7 February 2015 (UTC)[reply]
- Try adding "&utf8=1" to that query. Unfortunately backwards-compatibility requires the extra parameter. Anomie (talk) 12:37, 9 February 2015 (UTC)[reply]

Changes to pretty-printed HTML formats

The pretty-printed HTML formats (jsonfm, xmlfm, phpfm, rawfm) will likely lose the automatic linking of links and various other bits of fanciness. They will gain a hook to allow for syntax highlighting via extensions such as Extension:SyntaxHighlight_GeSHi.

Gerrit change: gerrit:161093

Comments (Changes to pretty-printed HTML formats)

The geshi extension uses CSS provided by ResourceLoader to style highlighted syntax. Are you thinking of adding ResourceLoader support to api.php? TBH, I don't really see the point of adding syntax highlighting... Legoktm (talk) 23:04, 16 July 2014 (UTC)[reply]

Just as an alternative idea.... Add a index.php Special page, where you can post the api.php output to, then just add a link in the api.php introduction paragraph to this "syntax highlighted" output. Avoids mixing api.php and index.php more than that is desired. TheDJ (talk) 13:36, 17 July 2014 (UTC)[reply]

Why to remove the auto-linking feature? --Ricordi samoa 23:37, 25 July 2014 (UTC)[reply]

Because it would probably get in the way of proper syntax highlighting, seems like it's only useful for action=help which is going to be redone, and has been the source of bugs like bug 61362. Anomie (talk) 13:34, 28 July 2014 (UTC)[reply]

HTMLizing action=help

The output from action=help is currently a plain-text document wrapped in the usual API output formatting, with a few links and bolding added in post-processing when viewed via xmlfm. This will be changed to output an HTML document intended for viewing in a browser.

The default view of api.php will provide only general information and documentation of the main module (i.e. the bits of the current page above the "*** Modules ***" line and the credits at the bottom), with the various module names in the documentation for the 'format' and 'action' parameters linking to documentation for those modules. The documentation for action=query will similarly document only the query module itself, with links to documentation for the various 'prop', 'list', and 'meta' modules. There will be an option to output documentation for all modules on one page, likely 'all=1'.

At the same time, the various bits of text in the API help should be made localizable.

The possibility of including a version of Special:ApiSandbox on the help pages is also under consideration, although that may be left for a later iteration.

If anyone is actually using action=help from a client, please comment about your use cases if they wouldn't be satisfied by this proposal.

Gerrit change: gerrit:160798

Comments (HTMLizing action=help)

Why don't we just turn the help into a special page? Legoktm (talk) 22:34, 16 July 2014 (UTC)[reply]
Forgive me if I'm wrong (I haven't look at MW code in a while), but it seems like it'd be good practice to keep the API specific code out of the main software, i.e. in the main api.php page. I can't remember how much mixing and mashing there is, though. Soxred93 (talk) 01:09, 30 July 2014 (UTC)[reply]
Well, we already have Special:ApiSandbox...I was just thinking of a static version of that page. Legoktm (talk) 21:21, 30 July 2014 (UTC)[reply]

Support uselang

Some bits of the API support 'uselang' since the underlying MediaWiki methods support it. Once errors and help can be localized, it would make sense for it to offically be listed.

Gerrit change: gerrit:160798

Comments (Support uselang)

I'm not sure I agree with this. What's an example use case aside from localized error messages which was covered above? Legoktm (talk) 23:13, 16 July 2014 (UTC)[reply]
- The parsing-related actions, mostly. And silly things like action=watch returning a UI message instead of letting the UI handle it (which reminds me, I should put that on the "to be deprecated" list). Anomie (talk) 11:02, 19 July 2014 (UTC)[reply]

Simplified continuation should indicate "pause points"

With the hard-to-understand query-continue continuation, it's easy for the client to know when it has a full batch of results for the current batch from the generator, so it can pause and process that batch before continuing the continuation.

The simplified continuation should support this sort of batching without having to parse the 'continue' parameter; a 'batchcomplete' boolean property in the result should suffice.

Gerrit change: gerrit:152359

Comments (Simplified continuation should indicate "pause points")

+1 Protonk (talk) 22:06, 22 July 2014 (UTC)[reply]
Not sure about this. The continue parsing code is needed anyway. Adding this boolean seems redundant. Rich Farmbrough 00:18, 5 August 2014 (UTC).[reply]

How is continue parsing code needed with the simplified continuation? The point of that method is the client should just merge the key-value pairs returned in the "continue" result property with the original query. Anomie (talk) 09:18, 6 August 2014 (UTC)[reply]

Fix list=deletedrevs

The API list=deletedrevs module is somewhat of an odd thing: in mode 1 it pretends to be a prop module, taking titles (but not revids, bug 71396) from the pageset, while in modes 2 and 3 it's a traditional list module. And since it's not really a prop module, it doesn't work right with the new-style continuation (bug 71389).

One solution is to split "mode 1" to a new prop=deletedrevisions, and move modes 2 and 3 to a new list=alldeletedrevisions. Handling of "badrevids" and "badpageids" in the pageset module may still be problematic, however.

Another solution is to deprecate list=deletedrevs using the pageset module (the "titles", "pageids", and "generator" parameters) and instead have it use "drtitles", "drpageids", and "drrevids".

I'm not sure which solution would be best; the first has the advantage of cleanly breaking, while the second probably more cleanly handles deleted revids.

Gerrit change: gerrit:168646

Comments (Fix list=deletedrevs)

+1. Coding something that interfaces with this module was somewhat of a mess. MER-C (talk) 06:50, 14 November 2014 (UTC)[reply]

General discussion

I Support basically everything in this. Please drop the 'may be dropped' ones. --Krenair ^{(talk • contribs)} 20:56, 16 July 2014 (UTC)[reply]
I'm excited to see this move forward. I think it would be helpful if this list was prioritized, or at least ordered. Some of these are extremely easy to implement (like dropping deprecated parameters), and could be done independently of this RfC IMO. Legoktm (talk) 23:17, 16 July 2014 (UTC)[reply]
- I feel good that I actually got things written down and will get to work on them in a focused way! ;) This RFC serves as "notify people of upcoming changes", "chance for people to object to upcoming changes", and "todo list" all at once, which is my excuse for the inclusion of easy stuff on the list. But to me it's not "seeking approval before beginning work on any of it". If you look back at the history, you'll see some of the stuff on earlier versions of this page actually has already been done (e.g. adding generator support to various actions); other easy stuff is welcome to be done similarly.Anomie (talk) 11:23, 19 July 2014 (UTC)[reply]
- As for the "dropping deprecated parameters", the main blocker there is checking how many people still use them to determine how far in the future $DATE should be in the "These long-deprecated parameters are finally going away on $DATE" announcement. Anomie (talk) 11:23, 19 July 2014 (UTC)[reply]
Another idea that just occurred to me is that pretty much anything that outputs a page set gives you both the namespace ID and the full title, including the namespace. Unless there's a good reason for this that I'm not thinking of, I think you could get rid of that redundancy. – RobinHood70 ^talk 22:33, 19 July 2014 (UTC)[reply]
I can't say whether any of the proposals are good or bad. I was flagged to this conversation after a dire warning of cat/dog cohabitation and 90% of this I do not understand. Any change of a digested version/how to test for support in a API consumer and if patching will be needed? Hasteur (talk) 20:57, 28 July 2014 (UTC)[reply]
- Not much chance of a digested version, unless someone else volunteers to write it. But each section here will likely be individually implemented (it won't be a massive change-everything-in-one-huge-change), and there will be announcements to mediawiki-api-announce when each change is about to be deployed. Anomie (talk) 11:45, 30 July 2014 (UTC)[reply]
Point of clarification here, then: when you say "transition period" throughout this page, are you talking strictly about the period between when you start modifications and when API2 is complete? If so, I'm a heck of a lot less worried (read: couldn't care less) about being able to detect whether I should be using "sane" or not, because I'm obviously not going to program against an evolving API. I assumed it would be a big rollout, and that by "transition period", you meant a major version or two for people to adjust. If that's not the case, then all I ask is that siteinfo contain some kind of easy API version identifier once the transition is final and the new API/changes to the JSON output become the default behaviour. – RobinHood70 ^talk 03:02, 5 August 2014 (UTC)[reply]
I mean the period of time when #Deprecation process is being gone through. Anomie (talk) 09:15, 6 August 2014 (UTC)[reply]
Good and bad. But generally good. I'm waiting to make the changes to Peachy. Cyberpower678 (talk) 19:37, 29 July 2014 (UTC)[reply]

API/Architecture work/Planning

Deprecation process

Comments (Deprecation process)

Items being considered

Changes to PHP output format

Comments (Changes to PHP output format)

Items for implementation

Changes to JSON output format

Comments (Changes to JSON output format)

Changes to XML output format

Comments (Changes to XML output format)

Internationalizing API warnings and errors

Comments (Internationalizing API warnings and errors)

Query item count

Comments (Query item count)

Rewrite prop=imageinfo from scratch as prop=fileinfo

Comments (Rewrite prop=imageinfo from scratch as prop=fileinfo)

Clean up log event parameter handling in action=logevents

Comments (Clean up log event parameter handling in action=logevents)

Items in progress

Remove obsolete output formats

Comments (Remove obsolete output formats)

Removal of long-deprecated parameters

Comments (Removal of long-deprecated parameters)

Simplified continuation as default for action=query

Comments (Simplified continuation as default for action=query)

Deprecated API usage report on WMF wikis

Comments (Deprecated API usage report on WMF wikis)

Allow generators to provide data

Comments (Allow generators to provide data)

Items not planned for implementation

Change defaults for "prop" parameters

Comments (Change defaults for "prop" parameters)

Allow paging the "titles" parameter

Comments (Allow paging the "titles" parameter)

Extension:SiteMatrix should create a query submodule

Comments (Extension:SiteMatrix should create a query submodule)

meta=siteinfo should be split up

Comments (meta=siteinfo should be split up)

Module prefix limiting

Comments (Module prefix limiting)

Embed the action in the URL

Comments (Embed the action in the URL)

Completed items

Removal of certain data from action=paraminfo

Comments (Removal of certain data from action=paraminfo)

Token handling

Comments (Token handling)

JSON output as default

Comments (JSON output as default)

Changes to pretty-printed HTML formats

Comments (Changes to pretty-printed HTML formats)

HTMLizing action=help

Comments (HTMLizing action=help)

Support uselang

Comments (Support uselang)

Simplified continuation should indicate "pause points"

Comments (Simplified continuation should indicate "pause points")

Fix list=deletedrevs

Comments (Fix list=deletedrevs)

General discussion

`meta=siteinfo` should be split up

Comments (`meta=siteinfo` should be split up)