Evaluating and Improving MediaWiki web API client libraries/Status updates
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. |
Status
edit13 June 2014
edit- https://wikimania2014.wikimedia.org/wiki/Hackathon/Pywikibot
- on how to *actually* set up pywikibot: https://upload.wikimedia.org/wikipedia/mediawiki/9/9b/Bots_hackathon_2013.pdf (thanks Merlijn!)
- JS resource: http://eloquentjavascript.net/contents.html
- pywikipedia-l: http://lists.wikimedia.org/pipermail/pywikipedia-l/2014-June/thread.html
- Mozilla wiki things: https://wiki.mozilla.org/Main_Page and https://wiki.mozilla.org/Contribute/Education/Wiki_Working_Group#How_to_Participate
- https://meta.wikimedia.org/wiki/WikiTeam/Dumpgenerator_rewrite
6 June 2014
edit- Useful things/TODOs
- list of all libraries to evaluate
- Ruby: Mediawiki::Gateway
- Perl: Mediawiki::Bot
- Perl: Mediawiki-API
- JS: nodemw
- JS: MediaWiki
- JS: WikiJS
- Python: pywikibot
- Python: mwclient
- Python: wikitools
- Python: simplemediawiki
- Java: JWBF*
- Java: Wiki.java
- list of all API actions (at least) to search for in libraries to check which parts of the API they expose
- script that will automatically check for existence of each of these in a given library
- list of breaking changes to the API within the last year
- list of all libraries to evaluate
- http://lwn.net/Articles/582065/ on https and python
5 June 2014
edit- What is it to be Pythonic?
- Python style guide: http://legacy.python.org/dev/peps/pep-0008/
- Docstring conventions: http://legacy.python.org/dev/peps/pep-0257/
- Wikimedia diversity conference notes: http://adainitiative.org/2013/11/wikimedia-diversity-conference/
- Evaluating simplemediawiki
- wikidata API works better when you know it's http://www.wikidata.org/w/api.php not http://wikidata.org/w/api.php <--- WHY IS THIS INCONSISTENT
- NEVER MIND, need to have www.mediawiki.org/w/api.php too... automatic browser redirects strike again
- bug in the wikidata api: new continue param doesn't work with wbsearchentities, see: http://www.wikidata.org/w/api.php?action=wbsearchentities&search=abc&language=en&continue=&format=json yields
{"servedby":"mw1199","error":{"code":"internal_api_error_MWException","info":"Exception Caught: Internal error in ApiResult::setElement: Bad parameter","*":""}}
- in contrast, see https://en.wikipedia.org/w/api.php?action=query&list=allcategories&acprefix=List%20of&continue=&format=json
3 June 2014
editNotes from talking with Yuri and Sumana:
- API is written like SQL; amazing but not really cacheable.
- APIs--shopping cart. Good for bandwidth, direct optimization, one request.
- or: cacheing technologies, work per request.
- google: no per-request cacheing, ok.
- but MWiki: every person sees same content (ish--mod stuff like individual gender).
- preferences do mean that not-one-content-fits-all.
- API: more suited to non-localized/non-gendered.
- but content/metadata is less fragmentable, *should* be highly cacheable.
- wants to make more "blobbable": blobs with keys
- Varnish: reverse proxy, cacheing mech between backend and user
- Tollef used to work on Varnish.
- [read performance guidelines that Sumana's been drafting]
- blob = binary large object
- "not all blobs are created equal" some blobs could be understandable by Varnish, and Varnish can look inside to find executables, and can replace it with another blob.
- "edge-side includes"
- cacheing all has expiration dates
- AJAX--way of using JS to make pages more dynamic (now minus xml)
- if you do this on client side with AJAX there will be *two* server calls but not for
- server(center)-edge(customerish)
- backend-cacheing infrastructure/front server/"edge cacheing"-client
- 2 hard problems in CS: cache invalidation, naming things, and off-by-1 errors.
- cacheing layers
- memcached/memcache(d) <---came from LJ and
- key-value store <---- file structure
- don't put things there that you permanently want in your life: put them in a "store" <----connotation of permanence
- memcached evicts things, you can't assume that things stay there. (can say: has a life of 12 h)
- put enough stuff in there that there's a high cache-hit ratio.
- so, redoing API:
- on the one hand, simple and cacheable.
- otoh, backward compatibility.
- breakaway path: one new one blobbable/cacheable/content-focused
- actually use http error codes.
- current API: bunch of minor bugs, inconveniences, verbosity. Cleaning up is tedious and high-risk/low-reward.
- library designed from ground-up to take advantage of SQL-ish API structure
- Yuri: does python C# php
- Python: requests to minimize code/readability. Python 3-compatible.
- old vs. new mediawikis--developing to old is inefficient, developing to new means you have to "fake it on the client" if the server is not supported"
suggests: pretend everything's old
- framework = client library (like pywikibot)
- should use the NEW continue method
28 May 2014
editThings I want to do today:
- Finish making slides.
- Write an "about APIs" introductory post (Done! http://franceshocutt.com/2014/05/28/a-beginners-definition-of-web-api/)
- Outline a "resources for the MediaWiki API" post
- See if there's anything else to do; start evaluating libraries?
- Practice talk
- Do these kindly.
27 May 2014
edit- Wikimedia technical search: https://www.google.com/cse/home?cx=010768530259486146519:twowe4zclqy
- First revision of library gold standard: API talk:Client code
- [1]/[2]
- Java package repository: Maven
- Rust language's conduct policy
- on bashing your head against problems: http://www.mattringel.com/2013/09/30/you-must-try-and-then-you-must-ask/
- on Ruby gems: https://github.com/radar/guides/blob/master/gem-development.md
22 May 2014
edit- finished initial evaluation of all client libraries
- Java's still a bit shaky, but wooooo!
- "Best library" count: 1 Ruby, 4 Python, 1 Perl, 3-4 Java (?), ~3 JavaScript
- Added links here: https://meta.wikimedia.org/wiki/Research:Resources#Research_Tools:_Statistics.2C_Visualization.2C_etc. and here: http://wikipapers.referata.com/wiki/List_of_tools
- Went through the rest of the libraries in API:Client code to check for last update date, put that info on the page
- Offered @edupunkn00b API help when she gets there on her own project; she'll make notes of roadblocks in documentation/learning
- some links with JS/API resources: User:Waldir#API + Javascript
- went over the gold standard stuff with Mithrandir on IRC: "I think having that gold standard there is good and while I'm picking at many of the individual points, the collective seems well-thought out to me."
- emailed mentors with progress/request for feedback
- TODO
- Draft slides for API talk
- Start an outline of resources for API workshop!
21 May 2014
edit- updated https://en.wikibooks.org/w/index.php?title=Perlwikibot&stable=0 to remove much obsolete information
- continuing to evaluate library capabilities (Python)
- discussed upcoming WikiConference2014 talk with Sumana (IRC)
- Signed up for a lightning talk on feminist hackerspaces: http://wikiconferenceusa.org/wiki/Lightning_Talks#Sign_up
- Thinking a lot about this in the context of "open knowledge": http://dawnnafus.files.wordpress.com/2008/09/patches-revised2.pdf (via @betsythemuffin on twitter) [citation: http://nms.sagepub.com/content/early/2011/11/09/1461444811422887]
- Cleaned up this a bit: Wikipedia:Creating_a_bot#Programming_languages_and_libraries (Perl and Ruby sections, Python and Java looked fine)
- Links from @edupunkn00b on REST: http://skillcrush.com/2012/07/13/rest/, http://www.infoq.com/articles/rest-introduction
- More than you ever wanted to know about git workflow possibilities: https://www.kernel.org/pub/software/scm/git/docs/user-manual.html#the-workflow
- TODO
- Email Tollef/Brad/Merljin tonight with what I have and some questions about JS/Java and then what developers want in a library
- Start writing up criteria/"gold standard", considering a few different developer-users (novice to expert?)
20 May 2014
edit- meeting
- Evaluating and Improving MediaWiki web API client libraries/Progress Reports
- Today: evaluate library capabilities, make notes on API talk:Client code
- Finished Ruby, Perl, Java, part of Python notes
- Installed Eclipse IDE for Java
- Asked advice about Java conventions/language structure (classes, all classes, forever classes)
- https://github.com/mwclient/mwclient/issues/39 filed small doc bug on mwclient
- consider editing Wikipedia:Creating_a_bot#Programming_languages_and_libraries
19 May 2014
editToday I officially start my OPW internship!
- Things To Do
- put Evaluating_and_Improving_MediaWiki_web_API_client_libraries/Status_updates/Search_results into API:Client Code
- begin evaluating libraries against the following criteria:
- Has it been updated in the last 12 mo?
- Does it have a lot of open bugs/pull requests, especially compared to the number closed?
- Does it have documentation, code samples, and tests provided?
- does it, at the minimum, handle logins/cookies/continuations? (even "syntactic sugar" libraries should do these things)
- Has it been updated in the last 12 mo?
- Results, resources, misc from an IRC meeting with Sumana, Tollef, et al.
- Reminder that github graphs exist, like: https://github.com/dreamwidth/dw-free/graphs/contributors
- Data for Wikimedia traffic:
- Breaking changes to the API (and therefore a timeline of changes that API client library developers should have taken note of) are (very much should be) mentioned in the release notes Release_notes/1.22#API_changes, on http://lists.wikimedia.org/pipermail/mediawiki-api-announce/, and in HISTORY https://git.wikimedia.org/blob/mediawiki%2Fcore.git/master/HISTORY.
- the support-matrix on Wikia (http://api.wikia.com/wiki/Client_libraries#Notes) was last updated in 2011 (http://api.wikia.com/wiki/Client_libraries?action=history), which I believe is after Wikidata was started
- Python's requests library handles cookies: http://docs.python-requests.org/en/latest/user/quickstart/#cookies
- IRC:
- http://en.flossmanuals.net/GSoCStudentGuide/ch014_communication-best-practices/
- pastebin for sharing multiline code/error/results things: http://tools.wmflabs.org/paste/
- http://www.harihareswara.net/sumana/2014/02/26/0
- Wikimedia mailing lists: Mailing_lists/Overview#MediaWiki_and_technical
- commentary on localization: http://aharoni.wordpress.com/2011/08/24/the-software-localization-paradox/ (came up when discussing the [lack of] API localization)
- from commentary on pywikibot, Python 2 vs. 3 as another slow deprecation process:
- http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html
- https://wiki.python.org/moin/Python2orPython3
- http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#why-is-python-3-considered-a-better-language-to-teach-beginning-programmers
- http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#slow-uptake
Explanation of various JavaScript variants: http://organicdonut.com/?p=479
2 April 2014
editCurrently reading: http://aosabook.org/en/index.html.
Evaluating and Improving MediaWiki web API client libraries/Status updates/Search results
http://journal.code4lib.org/articles/8962
Wikimedia research hub: m:Research:Resources. List of tools: http://wikipapers.referata.com/wiki/List_of_cross-platform_tools.
12-19 March 2014
edit- Starting out
- Learned what APIs are and discussed with Sumana what the point of an API library is
- Ideally, it provides affordances that lets you access the deeper wiki structure in an intuitive and functional manner
- Asked around for well documented APIs that other people have suggested
- Ruby/S3 SDK
- Google Drive
- Google Android
- Mailchimp
- Looked at the code and the documentation for the Python libraries listed on API:Client Code
- Noticed that some of the libraries created layers of abstraction around the MediaWiki API, and others were very simple wrappers over the MediaWiki API
- Compared the three simple libraries on whether they are maintained, documentation quality, and whether the library includes unit tests. early revision
- Attempted to start testing the simplemediawiki library...
- ...but flailed very hard at setting up my tools for it. My portable computer only has Windows working on it right now, so, lessons learned:
- I already had Python 2.7 installed, but it turned out that I didn't have a package manager. It additionally turned out that pip is ironically difficult to install on Windows.
- I tried installing setup_tools with the installer it came with and then installing pip with setup_tools. When I then tried to use pip to install the simplemediawiki library I got error messages referencing "egg_info failed", usually associated with a bad package installer.
- A recommended Windows .tar.gz unzipper is http://www.7-zip.org/. Note that you have to run it twice, once for the .tar and once for the .gz. This is apparently a
WONTFIX
. - setup.py to install setup_tools, setup_tools to install pip, pip to install simplemediawiki, mwclient, and requests. Success!
- Writing test scripts
- Started trying to use simplemediawiki to make API calls, initially trying those suggested in the API sandbox.
- Problems along the way:
- Figured out that the call() function was very close to the actual API calls. I wasn't totally clear that
'action'
wasn't to be replaced by e.g.'wbsearchentities'
, but once I looked at the API documentation I could see that the same arguments that the API normally took were simply passed in as a dict) - Figured out not to try Wikidata API calls with the Mediawiki page!
- Tested queries of various sorts, including ones that returned data on missing pages
- See representative tests here, with their results
- Figured out that the call() function was very close to the actual API calls. I wasn't totally clear that
- API calls with get seemed to be working ok, so I started testing page-editing capabilities
- Created an account for User:fhocutt bot
- Tokens were confusing (remembering python syntax helps, they're not fetched as json, also see: http://stackoverflow.com/questions/17730144/getting-a-python-error-attributeerror-dict-object-has-no-attribute-read-t)
- The documentation on tokens and bots was somewhat helpful: Manual:Edit token, API:Tokens, Meta:User-Agent policy, Wikipedia:Bot_policy
- but:
http://www.mediawiki.org/w/api.php?action=tokens
andhttp://www.mediawiki.org/w/api.php?action=tokens&type=edit
both give me empty string for tokens and I can't get sandbox API calls with&action=edit
to work because I don't have a token. Trying to use the ones that the script gives User:fhocutt bot yields abadtoken
error.
- but:
- I got tokens and 'edit' working with simplemedialibrary! See this pastebin and API:Client Code/Access Library Comparison#Testing login, tokens, editing for details.
- Conclusion based on current work
- Simplemediawiki makes it easy to make calls pretty directly to the API interface in a simple python bot. If I pass it the arguments it expects, it works so far.
- To do: haven't tested any
post
calls besidesedit
so I don't know if login/cookies/tokens work with those.
- Started mwclient tests
- Once installed (also fine once I had pip), I looked at the documentation and pretty easily got it working for get calls (though you have to take care with capitalization or you get errors similar to this); having the variable names in the sample code distinct from the methods available would help users new to Python avoid this. (See: https://wiki.python.org/moin/BeginnerErrorsWithPythonProgramming.)
- See API:Client Code/Access Library Comparison#Tests for mwclient for details
Resources
edit- MediaWiki collaboration tools
- Wikimedia pastebin
- Example, shared on IRC with Sumana: https://tools.wmflabs.org/paste/view/1394197e
- MediaWiki code
- Bugzilla list of open API bugs
- Using this search page and searching for "API" yielded no results, but using the search textbox at the upper right corner does
- Submit a bug
- Learning styles resources for engineers/scientists
- Learning styles as used at Hacker School
- I love that Mel addresses the "but I don't fit into either of these options!" objection, because I thought precisely that at several points on the quiz
- Quiz to figure your own out
- Description of 4 learning-style spectra
- MediaWiki API resources
- Special:APISandbox not Special:API Sandbox
- API:Client code
- Project:Sandbox
- API#A simple example
- API:Tutorial
- the Wikidata API sandbox
- Extension:Wikibase/API#wbsearchentities
- Other MediaWiki resources
- Other API resources
- Google, Ruby, S3 APIs
- Ch. 1-2 of RESTful Web Services
- Beginner's guide for journalists who want to understand API documentation Short guide to the idea of APIs and usual documentation, assumes no previous experience with them
- Test pages/wikis, ok to use for trial edits