User:Catrope/Extension review/CollabWatchlist

Base revision: r95194

CollabWatchlist.php

edit
  • Config variables should be clearly separated near the top of the file, before any setup code. They should also have clearer documentation: the comment for $wgCollabWatchlistRecursiveCatScan is very vague, and two others don't have a doc comment at all
  • The fnFoo functions should be static functions in a CollabWatchlistHooks class, so as to avoid putting functions in the global namespace
  • Support for MW 1.16 and below in fnCollabWatchlistDbSchema() is pointless because you're using ResourceLoader, which is in MW 1.17+ only
  • The noctid SQL should only be run if you detect an old schema. Is it safe to assume that you're the only one who ever had that old schema?
    • How would I detect an old schema? Is it ok to run SQL commands within that function?!? User:Flohack

CollabWatchlistTest.php

edit
  • testCategoryTree() isn't really a unit test. All it seems to do is check that printTree() doesn't throw any exceptions, fatals or warnings, and let the user visually inspect the tree. A real unit test has assertions, and tests only a small piece of code (e.g. one function)
    • It was not intended to be a real unit test. I have added a markTestSkipped call in the setUp method in r99867. User:Flohack

sql directory

edit
  • Please combine all tables into a single SQL file
    • But $updater->addExtensionUpdate does not seem to support adding multiple tables in one sql file. I have seen several extensions, only doing one addExtensionUpdate call with one addTable, when in fact the sql file contains multiple tables. I'd consider that bad practice, as the code suggests it creates only one table, when in fact it creates several. Do you agree on that? User:Flohack
  • Why does the collabwatchlist table use rl_ as its field prefix? cw_ would make more sense
    • I added an sql file which renames that in r99867. See related question above, regarding detecting an old schema. User:Flohack
  • Put UNIQUE KEY(foo) outside of the table creation for SQLite compatibility. See the coding conventions page
  • Use field prefixes consistently. In particular, don't use the name user_id for a field that is a foreign key to user.user_id, because the entire point of using field prefixes is to avoid having to use syntax like user.user_id in queries to disambiguate your field names
  • ENUMs are evil, don't use them in new code. It's much easier to define numerical constants in PHP (e.g. COLLABWATCHLISTUSER_OWNER = 1) and put those in an integer field in your DB table
  • Why does collabwatchlistuser have an auto-incremented primary key field? Can't you just drop that and add a unique index on (rl_id, user_id) ?
    • The reason for the primary key is that one user can have several roles (rlu_type) within one list.
    • Same for collabwatchlistcategory and collabwatchlisttag
      Removed the primary keys in r99867 User:Flohack
  • Comment on line 3 of collabwatchlist_noctid.sql seems to have been cut off
    • See related question above, regarding detecting an old schema. User:Flohack
  • What is the collabwatchlistrevisiontag table for? It's not well documented (nor are any of the other tables, BTW, but those were easier to figure out by just reading the field names and the comments)

SpecialCollabWatchlist.php

edit
  • Yeah, the grandparent constructor thing is nasty but it's the way things work with special pages, unfortunately
  • This file duplicates lots of code from phase3/includes/specials/SpecialWatchlist.php , or to be more precise an older version of that file. I'm not gonna bother reviewing most of this file until that duplication is cleaned up.
  • COUNT(*) queries are evil because their running time is linear in the number of rows they need to scan, and this number is unbounded (LIMIT doesn't work the way you expect it to in this case)
  • There are a few things wrong with the tag filter query
    • The logic of how $invertTags triggers EXISTS / NOT EXISTS seems backwards
    • Subqueries are not supported by MySQL 4.0, and we still have four 4.0 machines in our DB cluster. I'm not sure whether it would make sense for this subquery to be rewritten to a join, performance-wise, I'd have to ask. You're right that regexing tag_summary is a dead end: it's not just impractical, it's also a performance nightmare
    • The query is built with raw SQL, which you shouldn't be doing. Even with a subquery, you can use $dbr->selectSQLText() to build your subquery, then inject it into the main query
  • Instead of using SpecialPage::getTitleFor( 'CollabWatchlist' ) you can just use $this->getTitle()
  • Don't use empty()
  • In wlGetFilterClauseForCollabWatchlistIds, you're using $rl_ids != 0, which is a loose comparison between an array and an integer. That's just evil
  • None of your queries are limited or paged. While this is a bug in the existing watchlist code (no paging so 20,000-page watchlists cause OOMs and timeouts and such), that doesn't mean it's OK to write new code that has the same problem
  • Don't use implode( ... addQuotes( ... if you've already demonstrated you know how to use makeList()
  • $this->addQuotes() is probably unnecessary if you clean up the previous point, but it's also unnecessary because you can use array_map( array( $db, 'addQuotes' ), $array )

CollabWatchlistEditor.php

edit
  • Instead of having a big switch-case statement with nine cases, break them out into separate functions
  • Don't use !isset() to check for null
  • Code like Title::newFromText( $wgCollabWatchlistPermissionDeniedPage )->getLocalURL() will cause a fatal error if an invalid title string is fed to newFromText(). Check the return value of newFromText() for null before calling methods on it
  • Don't use create_function() it's really, really, really evil. Add a static function to your class instead
  • What's up with redirecting to a wiki page upon a permission error? That's very unconventional. There are built-in facilities like $wgOut->showErrorPage() and friends, use them
    • Throwing a permissions error or redirecting away upon an invalid token is really bad UX: a user who submits their form after their session expires wants to see a descriptive error message ('sessionfailure' message in core), plus the form as they filled it out so they can resubmit the form right away. Destroying their form data is unacceptable
  • Assignments inside if conditions such as on lines 72 and 76 are discouraged in the coding conventions
  • checkToken() calls checkPermissions(), but this is not documented and is generally a bad idea
  • Again, do not use COUNT(*), especially if you only care about whether the count is >= 1. In that case, use LIMIT 1 in your query and check $dbr->numRows()
  • explode( "\n", trim( $list ) ) is guaranteed to return an array, you don't have to check for is_array() afterwards
  • In extractCollabWatchlistCategories(), the variables $text and $titleText seem to be used interchangeably in a very confusing way
  • In extractCollabWatchlistTags() you can use explode( '|', $foo, 2 ) to split a string on a pipe character instead of rolling your own using string manipulation
  • In showTitles(), if you transform a string into a Title, cache that so you don't end up doing it twice
    • I believe you would need to i18n the parentheses in this function somehow but I'm not sure how that's supposed to be done
  • showTagList() does not escape $title and $description. This is probably an XSS vulnerability
  • Four functions in a row whose sole purpose is to execute a COUNT(*) query. Don't use COUNT(*).
  • In getCollabWatchlistCategories() you don't need to check that there were >0 rows before you foreach over the result. Looping over an empty result is perfectly safe. Likewise, checking for !$res is unnecessary. select() will either return a ResultWrapper object or bail out with an exception, you're not gonna get it to return false unless you manually disable DB error exceptions in your code
    • If your namespace and title come directly from the database, you can use makeTitle() instead of makeTitleSafe()
  • We have wrapper functions for INSERT SELECT and DELETE. You shouldn't build raw SQL there, or use prepared statements (no one does that in MW, I don't know why we even support that)
  • Line 885: do not use new Article( $title ), use new Article( $title, 0 ) instead. The former has nasty side effects if an ?oldid= query parameter is set
  • Line 1010: timestamps that you're inserting into the database must be run through $dbw->timestamp() . To obtain the current timestamp, just call it without any parameters. You cannot insert ISO 8601 timestamps into the database, that is incorrect even for MySQL (which uses TS_MW)
  • Line 1172, line 1202, line 1232: $rlName is not escaped
  • Line 1210: $description is not escaped
  • Line 1303: improper uses of isset() and empty() on the same line
  • Line 1317: $listName is not escaped

CollabWatchistChangesList.php

edit
  • $gwlSpeciaPageTitle has a typo in its name
  • $rc->getAttribute() isn't used consistently. It's used on line 62, but on line 84 $rc->mAttribs['foo'] is used instead
  • insertRollback() inserts an undo link? How is that related to the feature set of this extension at all?
  • Line 213: don't use inline JavaScript, and don't use window.onLoad. Use jQuery instead, it's so much nicer
  • Line 254: don't use onFoo attributes, use jQuery
  • In getCollabWatchlistIdAndName(): don't mess with $wgDBTablePrefix here, you really shouldn't need to. If your code breaks if you don't use it, you're doing something very wrong

CategoryTreeManip.php

edit
  • Line 37: if the cache duration is gonna be a fixed constant, make it a constant like the cache keys
  • Line 66, 74, 102, 136: indentation is off. The coding conventions recommend you put single-statement if blocks in braces too
  • Line 94: don't use isset to check for null
  • Line 208, 290: so you have a variable that doesn't exist the first time you enter the loop, and you create it halfway through the first iteration. This is confusing and easily avoided by just initializing it to something like an empty array before you start the loop
  • Line 259: loose comparison between an array and an empty string is not a good idea
  • The query in getChildCategories() looks wrong, I don't see how it limits itself to child categories rather than category members. With the new DB schema you don't need to join against page and use page_namespace=14 to get child categories anymore, you can just use cl_type='subcat'. Also, it seems the cp table is useless in this query, you can take category titles and WHERE them straight into cl_to
  • Line 292: this should probably be a strong comparison. Strings behave strangely, with things like '0' == '00' returning true
  • Line 341: you cannot use queries like SELECT stuff FROM page WHERE page_title IN ('foo', 'bar') without a condition on page_namespace, because:
    1. It will return pages like Foo as well as Talk:Foo, User:Foo, Category:Foo etc etc (I'm assuming you only want categories here, so add 'page_namespace' => NS_CATEGORY
    2. It will melt the database because the query is unindexed and the enwiki page table has over 20 million rows