User:QEDK/GSoC 2020/Working with APIs and wrappers

*insert witty xkcd comic*

While I expected this to be the most fun part of the internship, and in some way it was but it was also what I did not see coming. As someone who has used GitHub for about ten years, the relationship between Git and GitHub is made to be quite slick or at least not very noticeable to an end-user who does not want to delve into it, except for the first-time configuration required with Git. In fact, if GitHub Desktop did set up Git for you, most people would probably not need to know Git at all (let me add that it's quite an important skill to have, you will inevitably get stuck doing tricky rebases or merges and you want to know how to handle your git rebase -i).

It's always (read: mostly) useful to use some kind of an API wrapper, especially for authentication and writing concise code. The bad thing about API wrappers is that they have more than their fair share of endpoints to cover and are often accompanied with scant or annoyingly concise documentation (read my last post about this!) This is partly because the API might change every month or so and I doubt the maintainers would find it affordable to have a documentation spring cleanup that often, it's a better approach in that regard to let the developer do their research ("no documentation is better than wrong documentation").

But having no documentation clearly has its downsides (but that's the thing with free and open-source software, innit?) - let's take a popular wrapper, for example, StackAPI (just to state: this is by no means a bad library, quite the opposite) is designed by default to give 500 results combining the results from five API "pages" or calls (with each page having a maximum of 100 results). If I wanted a maximum of 10 results, it would still make 5 calls which would seem to be overkill when each page can easily load up to a 100, so you'd need to tell it to make it only one call - while it's not a big deal in the grand scheme of things, it's important to keep in mind the principle of "death by a thousand cuts" and always, general API etiquette. And while I'm sure that StackOverflow wouldn't mind the extra four pages they would throw my way, it's always a good principle to make your software have the least footprint on any hardware.

Back on topic

Making synchronous API calls and expecting it to work fine might be folly but it's fine when you're making very few blocking calls - it's how requests avoids giving you ugly surprises. When working with GitHub's API, I chose to use PyGithub, which is set to do exactly one thing (make a pull request) so I can afford to stay far away from asyncio. On another note, if I may choose to air out (or rather vent), APIs which use filter codes as shorthands are… harder to get into. When the alternative to write extensive lists of parameters is to use obscure code, it's an easy choice (for some). Let's take for example:

answer = stackoverflow.fetch("answers/{ids}", ids=[str(questions["items"][0]["accepted_answer_id"])], filter="!9Z(-wzftf")

Does this filter code give anyone meaningful information? Definitely not, yet many API endpoints choose to do this. Another point to their detriment is that StackAPI or StackOverflow does not really let you know how to use filters, if you directly used requests, you'd spend the first ten minutes wondering which API request you were supposed to put the code in, in fact, the filters endpoint will generate the code for you from *sigh* an extensive list of parameters. And therein lies the rub of having an extensive, verbose API endpoint for your consumers. It's an unavoidable tragedy, so to say.

The options might be perfectly modular and arranged but not when the end-user cannot make up their minds what to do with those options // Unsplash License

PyGithub has its own quirks, expected quirks, for example, the difference between a property and a function is probably that the function would make a request "definitely" then and the property would be a stored state - but some things should have an alternative to check their present state and that should be contrasted with a property which stores the last-known state. PyGithub makes fetching the current state a bit going around the nose instead of directly touching it - not to say the implementation is bad, it has a FITB (fill in the blanks) approach where properties we don't have a value already via previous calls, resulting in an API call, this is certainly efficient and helps a lot - when the parameter is not so generic that it's always returned. But, the wrapper is concise enough to let you get on with your business in any way. I would have to use something like this to get a pull request's state from a GitHub repository:

repo = Github(os.environ.get("GitPAT")).get_repo("QEDK/goodbot")
pull = repo.create_pull(title="Title here", body="hard at work 🛠️", head="our new branch", base="master", maintainer_can_modify=True)
if repo.get_pull(pull.number).state == "closed":
	# do something when pull request is closed

compared to something hypothetical like:

if pull.get_state() == "closed":
	# do something when pull request is closed

In my experience, getters and setters are remnants (still quite existent) from the C/C++ days but absolutely integral to libraries I feel, you should keep your end-users away from members without properties, or at least, don't trust them enough to let them mutate the attributes themselves. Python libraries tend to focus on using properties for a lot of things, properties which are wrappers for functions, to be quite honest, I like the usability and the concept of storing a state (or "snapshotting") as an available property is one that every library should implement, PyGithub's approach is one of the better ones I've seen, I've also seen worse, possibly where every property invocation is an API call, note that while this is not a problem in and of itself, an end-user is not necessarily able to distinguish the expensive nature of their simple property "invocation" compared to a function call on the stack - hence, making a "gloves-on" approach to end-users necessary. The solution to the problem of mutable attributes was meant to be the @property decorator of course but it's a Pythonic solution to a Pythonic problem - one that tends to be misused or outright abused towards mutation and providing featureful implementations instead of being used in the way it was intended, which is to enforce property access and immutability.

Want to be a new developer? See https://www.mediawiki.org/wiki/New_Developers
Want to interact with people of the Wikimedia Outreach community? Come visit us at Zulipchat.
Want to begin learning Rust? Read The Book.

Do let me know in the comments if you have any suggestions! Next time, more about Git and GitHub. 😊

Contributions
Parse project idea pages and update templates dynamically. Add a feature to open a pull request with suggested updates and close them once dealt with. Add support to `goodbot` to parse templates and provide project data. Add commands and update help texts for project-related commands. Fix upstream `AttributeError` in `ircbot`.