Evaluating and Improving MediaWiki web API client libraries/Status updates

Status edit

13 June 2014 edit

6 June 2014 edit

  • Useful things/TODOs
    • list of all libraries to evaluate
      • Ruby: Mediawiki::Gateway
      • Perl: Mediawiki::Bot
      • Perl: Mediawiki-API
      • JS: nodemw
      • JS: MediaWiki
      • JS: WikiJS
      • Python: pywikibot
      • Python: mwclient
      • Python: wikitools
      • Python: simplemediawiki
      • Java: JWBF*
      • Java: Wiki.java
    • list of all API actions (at least) to search for in libraries to check which parts of the API they expose
    • script that will automatically check for existence of each of these in a given library
    • list of breaking changes to the API within the last year
  • http://lwn.net/Articles/582065/ on https and python

5 June 2014 edit

Evaluating simplemediawiki

3 June 2014 edit

Notes from talking with Yuri and Sumana:

  • API is written like SQL; amazing but not really cacheable.
  • APIs--shopping cart. Good for bandwidth, direct optimization, one request.
  • or: cacheing technologies, work per request.
  • google: no per-request cacheing, ok.
  • but MWiki: every person sees same content (ish--mod stuff like individual gender).
    • preferences do mean that not-one-content-fits-all.
    • API: more suited to non-localized/non-gendered.
    • but content/metadata is less fragmentable, *should* be highly cacheable.
  • wants to make more "blobbable": blobs with keys
  • Varnish: reverse proxy, cacheing mech between backend and user
  • Tollef used to work on Varnish.
  • [read performance guidelines that Sumana's been drafting]
  • blob = binary large object
  • "not all blobs are created equal" some blobs could be understandable by Varnish, and Varnish can look inside to find executables, and can replace it with another blob.
    • "edge-side includes"
  • cacheing all has expiration dates
  • AJAX--way of using JS to make pages more dynamic (now minus xml)
  • if you do this on client side with AJAX there will be *two* server calls but not for
  • server(center)-edge(customerish)
  • backend-cacheing infrastructure/front server/"edge cacheing"-client
  • 2 hard problems in CS: cache invalidation, naming things, and off-by-1 errors.
  • cacheing layers
  • memcached/memcache(d) <---came from LJ and
  • key-value store <---- file structure
  • don't put things there that you permanently want in your life: put them in a "store" <----connotation of permanence
  • memcached evicts things, you can't assume that things stay there. (can say: has a life of 12 h)
  • put enough stuff in there that there's a high cache-hit ratio.
  • so, redoing API:
    • on the one hand, simple and cacheable.
    • otoh, backward compatibility.
    • breakaway path: one new one blobbable/cacheable/content-focused
    • actually use http error codes.
  • current API: bunch of minor bugs, inconveniences, verbosity. Cleaning up is tedious and high-risk/low-reward.
  • library designed from ground-up to take advantage of SQL-ish API structure
  • Yuri: does python C# php
  • Python: requests to minimize code/readability. Python 3-compatible.
  • old vs. new mediawikis--developing to old is inefficient, developing to new means you have to "fake it on the client" if the server is not supported"

suggests: pretend everything's old

  • framework = client library (like pywikibot)
  • should use the NEW continue method

28 May 2014 edit

Things I want to do today:

27 May 2014 edit

22 May 2014 edit

  • finished initial evaluation of all client libraries
  • Java's still a bit shaky, but wooooo!
  • Offered @edupunkn00b API help when she gets there on her own project; she'll make notes of roadblocks in documentation/learning
  • went over the gold standard stuff with Mithrandir on IRC: "I think having that gold standard there is good and while I'm picking at many of the individual points, the collective seems well-thought out to me."
  • emailed mentors with progress/request for feedback
TODO
  • Draft slides for API talk
  • Start an outline of resources for API workshop!

21 May 2014 edit

TODO
  • Email Tollef/Brad/Merljin tonight with what I have and some questions about JS/Java and then what developers want in a library
  • Start writing up criteria/"gold standard", considering a few different developer-users (novice to expert?)

20 May 2014 edit

19 May 2014 edit

Today I officially start my OPW internship!

Things To Do
Results, resources, misc from an IRC meeting with Sumana, Tollef, et al.

Explanation of various JavaScript variants: http://organicdonut.com/?p=479

2 April 2014 edit

Currently reading: http://aosabook.org/en/index.html.

Evaluating and Improving MediaWiki web API client libraries/Status updates/Search results

http://wikiconferenceusa.org/wiki/Submissions:Using_web_API_client_libraries_to_play_with_and_learn_from_our_%28meta%29data

http://notabilia.net/

http://journal.code4lib.org/articles/8962

http://blog.hatnote.com/

http://seealso.hatnote.com/

Wikimedia research hub: m:Research:Resources. List of tools: http://wikipapers.referata.com/wiki/List_of_cross-platform_tools.

12-19 March 2014 edit

Starting out
  • Learned what APIs are and discussed with Sumana what the point of an API library is
    • Ideally, it provides affordances that lets you access the deeper wiki structure in an intuitive and functional manner
  • Asked around for well documented APIs that other people have suggested
    • Ruby/S3 SDK
    • Google Drive
    • Google Android
    • Mailchimp
  • Looked at the code and the documentation for the Python libraries listed on API:Client Code
    • Noticed that some of the libraries created layers of abstraction around the MediaWiki API, and others were very simple wrappers over the MediaWiki API
    • Compared the three simple libraries on whether they are maintained, documentation quality, and whether the library includes unit tests. early revision
Attempted to start testing the simplemediawiki library...
  • ...but flailed very hard at setting up my tools for it. My portable computer only has Windows working on it right now, so, lessons learned:
    • I already had Python 2.7 installed, but it turned out that I didn't have a package manager. It additionally turned out that pip is ironically difficult to install on Windows.
    • I tried installing setup_tools with the installer it came with and then installing pip with setup_tools. When I then tried to use pip to install the simplemediawiki library I got error messages referencing "egg_info failed", usually associated with a bad package installer.
    • A recommended Windows .tar.gz unzipper is http://www.7-zip.org/. Note that you have to run it twice, once for the .tar and once for the .gz. This is apparently a WONTFIX.
    • setup.py to install setup_tools, setup_tools to install pip, pip to install simplemediawiki, mwclient, and requests. Success!
Writing test scripts
Conclusion based on current work
  • Simplemediawiki makes it easy to make calls pretty directly to the API interface in a simple python bot. If I pass it the arguments it expects, it works so far.
  • To do: haven't tested any post calls besides edit so I don't know if login/cookies/tokens work with those.
Started mwclient tests

Resources edit

MediaWiki collaboration tools
Learning styles resources for engineers/scientists
MediaWiki API resources
Other MediaWiki resources
Other API resources
Test pages/wikis, ok to use for trial edits