Requests for comment/CheckUser requirements

In T139810 we have come to the conclusion that the best way to revamp the CheckUser extension is to come up with a list of desired features, then decide how to implement them (rewrite the code, patch parts of it, etc.) This RFC allows us to reach consensus about, dare we say, the CheckUser 2.0 (the future of the CheckUser extension).

Request for comment (RFC)
CheckUser requirements
Component CheckUser
Creation date
Author(s) Huji
Document status in discussion
See Phabricator.

Background edit

Designed c. 2005, the extension is one of the critical tools that helps us to deal with problematic cases of abuse such as sock puppetry, vandalism, and spam (see history here). As time went by, the needs of the projects increased, and, as we can see in the CheckUser work board, the bugs accumulated without being resolved, primarily because its code, albeit old, also seems hard for developers to read and work with developers (refer to T132892: CheckUser UI revamp and its related tasks, as well as the work board linked above).

Problem edit

The lack of an active maintainer for the extension lowers development productivity in this area (i.e., bug resolution, testing and extension development ex Phabricator). One developer describes the CheckUser extension as follows: "the backend design is dubious, and the frontend is pretty archaic."

Proposal edit

To resolve these issues, we think we need to think about overhauling the CheckUser extension. That overhaul should also be an opportunity to make the extension work with all the new features and the new code MediaWiki has at its current state. We can also take this opportunity to gather opinions from CheckUsers on which new functions the new CheckUser extension should have, etc.

Features needed:

  • Sorting the results by IP (to make it easier to find common IP "ranges") as well as by time (currently, only the latter is supported).
  • Getting a list of distinct user agents (UAs) used by a user.
    • In fact, the "get IPs" function should be replaced by a "get summary" page which shows to you distinct IPs (sortable by time or by IP), distinct UAs (sortable by time or UA), and distinct IP-UA combinations (sorted by time or by IP).
  • Providing additional information about UAs. Something like http://useragentstring.com/ or similar websites, would be the minimum. In the ideal world, the CheckUser tool should analyze the UA and show detailed information like:
    • What browser, OS, etc. does the UA represent
    • What year and month was that particular browser version released, and what year and month was a next version released (I find the date at which a user upgrades their browser a good clue for matching accounts or rejecting their similarity)
    • Does it look like a valid or a forged UA?
  • Providing additional information about IPs. For example, IP's ISP, its country, and the ISP's range. Of course this information changes over time, and is not publicly and freely available. However, the CheckUser code should be modified such that you could "extend" it by providing a CSV file containing IP-to-country and/or IP-to-ISP mappings. That way, we don't need to publish such a mapping as part of the code, but major users of the CheckUser tool such as WMF can pay for proprietary mappings and use them for their wiki.
  • For the IP->users view, have possibility to bulk block users and/or delete all their page creations.
  • Some kind of "open all in new tabs" at different places.