Tuju> i would recommend like twitter does it, they version their protocols into urls and keep each url working, regardless that their db layout changes. hence they wont break applications.
Talk:Requests for comment/API roadmap
About this board
For older discussions, see Archive.
+1, sounds great!
It would be really handy if the API deprecation messages used an identifier, of some sort, that clients can use to 'understand' what these messages are. This doesnt even use the word 'deprecate'/'deprecation'.
- Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query.
Using i18n codes for API warnings should be a high priority, as not everyone can understand English, and clients do not want to show English uncoded warnings to non-English users.
Errors should use reasonable HTTP response codes
It would be great if API errors used the HTTP error response codes rather than returning 200.
I see that we previously WONTFIXed this request but now that we're overhauling the API I think we should give it another look.
My reasoning in that bug still applies: an HTTP error indicates that something went wrong with the HTTP request, for example that the target resource wasn't found or couldn't be executed. As far as the API is concerned, that's the transport layer. If the API request is able to be processed but the result is an API error, that's reported at the application layer instead.
Say the API did return an HTTP 400 or 500 response code for an API error. How does the client determine that this is an API error rather than a varnish timeout or the like? I don't much like "blindly try to parse the body, if it succeeds it's an API error".
Also, say the API did return an HTTP 4xx response code for an API error. People would probably expect that action=delete would return a 404 if the target page isn't found to be deleted. But then what happens with action=query, when there may be multiple titles and some might be not found and others not? Or look at action=watch, before gerrit:53964 you could have made the case for it to return 404, but now it's like action=query.
I agree with Anomie's reasoning. An API error is not an HTTP error, and should not be reported as one.
If the error is related to application layer data, HTTP error codes are wrong, of course.
However IIRC the MWAPI emits server errors with HTTP 200 and a response that includes an error code like internal_api_error_ExceptionFooBar. Those are a server error, and should/could have a HTTP 50x code because the application failed while attempting to complete processing of the request, and all bets are off on what parts of the request were performed and committed to the database.
The current approach isnt _wrong_, as 50x are optional, but it worth reconsidering using them for the cases they actually apply to.
Architecture Summit notes
Please see Talk:Architecture Summit 2014/Storage services#API versioning and additional notes on that page.
Follow-up action items meeting september 11
The action items from the September 11 meeting are:
- Wikia makes their RFC public, ASAP :) - Federico
- Separate RfC re RESTful API?
- Prototype Parsoid REST API - Gabriel Done
- Find motivating use case re flags versus versions - Yuri
- Restructure current RFC - Brad/Yuri ?
- Sumana to post this etherpad onwiki, email mediawiki-api & wikitech-l
I have added Done based on my understanding of the current status, please feel free to edit.
The REST storage service and public content API are now discussed in these two closely related RFCs: Storage service and Content API.
Wikia has released a REST API that covers their immediate needs: . They also have an API team that might work on a more general REST API. I hope that we can collaborate with them on the REST API.
Here is a full copy of the etherpad before it disappears:
API roadmap conversation, Sept 11 2013 at WMF office * Attendees: Yuri, Max, Yuvi, Erik B, Brad, Sumana, Subbu, Gabriel, RobLa, Roan, Federico, Tim == REASONS / JUSTIFICATIONS == Current proposal: https://www.mediawiki.org/wiki/Requests_for_comment/API_roadmap * Change output format - structured warnings / errors, localization ** Kill XML specifically :( (it's 25% of non-OpenSearch traffic but it's a mess and needs to die) * Split traffic between server pools depending on action ** Change URL to e.g. api.php/query?... *** Why make the URL longer? == Discussion == * Module refactoring - https://www.mediawiki.org/wiki/Requests_for_comment/API_roadmap#Modules_refactoring Drawbacks to versioning modules, versus individual flags: * Making promises we can't keep: we say action=foo~3 isn't going to change, but then some security issue or core change comes along and we have to break it anyway. * Code rot: "foo~3" implies an entirely separate module, the code for which will easily rot. ** Yes, the version *could* be treated as a feature flag within the module. Then you have this vaguely-named flag that doesn't indicate what it does besides "version". * Say we make "foo~3", then "foo~4". If a client wants something introduced in ~4, they have to accept ~3 as well. ** Encouraging people to upgrade to the latest version is often a benefit *** But forcing them to upgrade many features for one feature? URL change won't help with caching yet -> REST content API * query param order random, cannot be purged * don't want to wrap HTML in JSON https://www.mediawiki.org/wiki/Talk:Requests_for_comment/API_roadmap#Clean_up_formats_23045 Wikia's requirements: work with SDKs their ecology likes * REST ** What kind of REST? it means something different to everyone! ** Cacheable as much as possible -> no query params, deterministic URL so purgeable ** Representations? State Transformations? *** Content types, not everything should be wrapped in JSON **** +1 ** Discoverability - API results include URL's to possible state transofmrations, related resources, etc. Yuri: How will we proceed in changing the API? * Sumana advises: consult existing API usability research, just as we consult users & MW developers How do we change defaults? Star versus underscore is so JS can do "foo._" instead of "foo['*']" > Avoid underscores in js identifiers per conventions, maybe use "content" instead of "*" / "_" (also more descriptive) Idealist vs Pragmatism - Do you want something beautiful? Or something that continues to work? Why can't it do both? * The argument is to find specific use cases for each individual change, an overall beautiful API is not definable as individual little pieces but as an overarching design methodology ==NEXT== * Wikia makes their RFC public, ASAP :) - Federico ** Separate RfC re RESTful API? ** Prototype Parsoid REST API - Gabriel * Find motivating use case re flags versus versions - Yuri * Restructure current RFC - Brad/Yuri ? * Sumana to post this etherpad onwiki, email mediawiki-api & wikitech-l
Meeting notes on etherpad
Last week we met in the office and discussed this RFC. The discussion notes are on the etherpad.
It seems to me that using PATH_INFO is going to make things more complicated for clients, as instead of at a low level taking an assoc/dict/hash/etc of query parameters, they have to also take a PATH_INFO value. And while that's not much of a complication (if nothing else, a magic key could be extracted from the assoc/dict/hash/etc), what is the benefit?
I agree that it will have to make "action" a special parameter (or could be extracted from the dict), but there are several benefits:
- Ability to easier partition server farm to create a cluster dedicated to certain actions - like parsing (requested by parsoid team)
- webserver access log files will contain the action even for post requests
- No need to introduce api2.php just yet - we can determine new version by request style
- Future core version changes can be done in the style api.php/action/2?...
- Shorter URL
Partitioning, ok. Versioning could as well be done with action=foo/2. api.log already contains the action for post requests; I guess you're talking about the webserver access.log? Shorter URL and "api2.php", meh.
Anomie, the core value is the #1 - everything else are side benefits :) As for logs, unless this is a very very recent change, I don't see action in the post req in the logs.
Are you looking in the api.log (on fluorine), or in webserver access logs?
I'm looking at the api log files that are rsynced to stats1.
I don't know what's in that one.
CORS and third-party web apps
- different URLs -> breaks potential shared caching with other apps that use the same queries over JSON
- harder to get progress feedback or detect errors
- can't do POST requests at all
- authentication is disabled to prevent CSRF stealing a web user's credentials
This has practical limitations for some mobile platforms as well -- for instance our Wikipedia app for Firefox OS is a web app hosted on bits.wikimedia.org. Since an XHR can't access *.wikipedia.org/w/api.php from there, it has to use either JSONP or a server-side proxy (icky, hides IPs, no load balancing, etc). Since a proxy is icky and hard to scale, we're using JSONP for now... but this won't work once we try to add login and editing features, since auth isn't available.
If we had CORS headers set up to allow non-authenticated (no cookies) access via XHR from all third-party domains, and we could auth without cookies (can we use a token for this? I .... think so) that would be helpful.
Not sure if that's doable on the current API or not. :D
There is some sort of CORS handling in the API, but I will need to look further into it to get a better understanding of how it is setup.
Basically, it's three parts:
- The client adds an "origin" parameter to the request to indicate the origin and explicitly request CORS.
- The browser adds an "Origin" HTTP header, to also indicate the origin.
- The MediaWiki configuration has
$wgCrossSiteAJAXdomainExceptionsto determine whether to allow the cross-domain request.
First, the "origin" parameter must match one of the values in the "Origin" header, or the request fails.
Second, the "origin" parameter must match one of the patterns in
$wgCrossSiteAJAXdomains and not match any pattern in
$wgCrossSiteAJAXdomainExceptions. These are currently set to allow various WMF wikis (but bits.wikimedia.org is not in the list).
If both checks pass, then the appropriate CORS headers are returned to instruct the browser to allow the request, including cookies.
I guess the basic idea behind this proposed non-cookie authentication method would be that it works just like cookies except that it's handled by the client code rather than the browser?
Yeah, it would be nice to drop JSONP for CORS. We'll have to disable anonymous editing over CORS though so that the API can't be turned into a kind of mass spam attack that could come from absolutely any innocent IP in the world without people's knowledge.
For auth via tokens. This would basically be where OAuth (or ;) something like OAuth) would fit in.
Technically speaking, I think anonymous editing via action=edit already allows that kind of attack. *cough*
*facepalm* right it can, and I had a private bug about fixing that.
Clean up formats
yaml format can be removed, since it's now identical to json. format=txt and format=dump seem entirely pointless, and format=dbg seems redundant to format=php for real use and format=rawfm for debugging.
Now for the controversial part: format=xml seems to be a major source of problems, since it needs special handling all over the place. If we keep it at all, it would be very nice to change it to something that doesn't need magic "_element" and "*" members and won't cause bugs like bug 43221 (for the last, if nothing else define some sort of reversible encoding for invalid names). This would also allow us to get rid of format=rawfm, since we won't have any more magic elements.
YES YES YES. Kill XML, let us just export an associative array and have it go straight to a JSON object.
Multiple formats support is awkward and, especially with XML, is just plain weird.
I'd strongly like to kill all formats except for JSON. JSON is widely supported, simple, doesn't have weird-ass attributes and text contents, and generally should be a good default.
Kill XML with fire, please please please!
The other formats are basically equivalent to JSON (YAML was actually replaced with JSON because valid JSON is valid YAML!) and there's not much benefit to their existence.
Serialized php is faster in php, and easier for those of us coding in php, and only slightly less efficient in bandwidth. There is a rather large code base of tools using php serialize.
Note that software using PHP serialization now should be able to update to JSON by simply changing the format parameter and switching from 'unserialize' to 'json_decode'. There _shouldn't_ be differences in the decoded data format, that I know of.
I threw together a quick benchmark:
On 2000-items of RecentChanges data, file size:
138K rc.json 134K rc.xml 187K rc.phpser
and speed per iteration:
$ php test.php Benchmarking xml... 4.436 ms Benchmarking json-objects... 4.846 ms Benchmarking json-assoc... 4.312 ms Benchmarking php... 2.776 ms
So yes, on ~140-190KB of tightly-packed RC data you might save 2 milliseconds of low-level parse time. I'm not convinced this is a significant savings.
@brion: generation comparison?
That's actually a much stronger argument for JSON, alas...
$ php bench.php Benchmarking json-objects... 7.774 ms Benchmarking json-assoc... 7.720 ms Benchmarking php... 12.301 ms
So, I'm wrong on the speed, and I apologize for that one.
For everyone's enjoyment, I present to you the formatting usage stats. XML gets about 500 reqs/min (drop from ~1000 3 months ago), JSON ~2100, PHP has been growing to about 200 now, YAML dropped from 1.3/min to sporadic, DBG (?!?) is consistently used at about 1.3/min, RAW frequently spikes to up to 30!!!, TXT averages 3, but the real kicker - 50 reqs per minute is the xmlfm... FML!!! Need to track and kill it with vengeance.
While the numbers are interesting, they may not tell the complete tale. The xml is a good example: there are three very sharp downward steps, suggesting three very high volume but specific tools have stopped using that format. Contrariwise, there's an informal but general increasing trend in PHP, suggesting a diversity of tools are using that format. Translated, this suggest a wider range of projects might be broken by removal of php as a format, while a smaller number of projects might be broken by removal of json as a format.
Yes, I know you're not suggesting eliminating php as a format.
But you are suggesting the content API should shut out the more diverse community of projects which are already using the API.
Amgine, I think we definetly should keep the current multi-format API model for query/action modules, with possible drop of WDDX and YAML, but on the content side we should make it uniform to take advantage of caching. If the difference between using PHP and JSON is simply replacing one built-in method with another, it shouldn't be that big of a deal. And yes, we can make content data model be HTTP-error-coded and possibly even non-structured blob based, removing the need for JSON vs PHP vs XML debate alltogether :)
In other words - keep using the current API, figure out what content (e.g. html) you need for some task, and than download the blob with a different content-api call. There shouldn't even be a need for an API library. At most there will be a simple json structure to separate TOC entries/sections - depending on the call.
xmlfm is default, so testing or building a new queries in the browser will use this format or the help page is used here.
In my opinion you should not drop xml, because not all program languages have native json or php format, for example java (at least in 1.6). Adding a new jar can be a blocker for this.
How would you feel about changing the XML format to something that would be less likey to cause issues? Something closer to an XML-format property list or WDDX. Or maybe just keeping WDDX as the XML format?
At which places the xml format makes problems? All xml related things should be in ApiFormatXml and nobody see it. bug 43221: property names with :: are also bad in json. When having a attribute name for content in json (like text or _continue) the xml wrapper can produce a text node out of that, than nobody needs ApiResult::setContent, when that is the problem.
I think the “special handling” Anomie was talking about is that you need to call
setIndexedTagName every time you want to return (numerical) array from the API. (There could be other situations that require special handling for XML, that I didn't encounter yet.)
There's also how other formats have to deal with a key named "*" so XML can do its "text content with properties" thing.
foo.bar notation, but
foo['bar'] still works.
Yes, * is also fine in json, but you must write foo['*']. In another thread some people do not want write the string notation and want the object notation. So it makes no sense to have other params in string notation and break this. Than you can keep * also.
The use of "*" is gratuitous, we can easily pick something more sensible. The use of "::" as keys in the API for things that use "::" as keys in MediaWiki core is not gratuitous.
Title is better
I like "API roadmap" much more than "API Future". Nice move. :-)