User:Catrope/Extension review/CollabWatchlist
Base revision: r95194
CollabWatchlist.php
edit- Config variables should be clearly separated near the top of the file, before any setup code. They should also have clearer documentation: the comment for
$wgCollabWatchlistRecursiveCatScan
is very vague, and two others don't have a doc comment at all- Fixed in r99867 User:Flohack
- The
fnFoo
functions should be static functions in aCollabWatchlistHooks
class, so as to avoid putting functions in the global namespace- Fixed in r99867 User:Flohack
- The defined constants (
COLLABWATCHLISTUSER_OWNER
and friends) should also be in a class- Fixed in r99867 User:Flohack
- Support for MW 1.16 and below in
fnCollabWatchlistDbSchema()
is pointless because you're using ResourceLoader, which is in MW 1.17+ only- Fixed in r99867 User:Flohack
- The noctid SQL should only be run if you detect an old schema. Is it safe to assume that you're the only one who ever had that old schema?
- How would I detect an old schema? Is it ok to run SQL commands within that function?!? User:Flohack
CollabWatchlistTest.php
edittestCategoryTree()
isn't really a unit test. All it seems to do is check thatprintTree()
doesn't throw any exceptions, fatals or warnings, and let the user visually inspect the tree. A real unit test has assertions, and tests only a small piece of code (e.g. one function)- It was not intended to be a real unit test. I have added a markTestSkipped call in the setUp method in r99867. User:Flohack
sql directory
edit- Please combine all tables into a single SQL file
- But $updater->addExtensionUpdate does not seem to support adding multiple tables in one sql file. I have seen several extensions, only doing one addExtensionUpdate call with one addTable, when in fact the sql file contains multiple tables. I'd consider that bad practice, as the code suggests it creates only one table, when in fact it creates several. Do you agree on that? User:Flohack
- Why does the
collabwatchlist
table userl_
as its field prefix?cw_
would make more sense- I added an sql file which renames that in r99867. See related question above, regarding detecting an old schema. User:Flohack
- Put
UNIQUE KEY(foo)
outside of the table creation for SQLite compatibility. See the coding conventions page- Fixed in r99867 User:Flohack
- Use field prefixes consistently. In particular, don't use the name
user_id
for a field that is a foreign key touser.user_id
, because the entire point of using field prefixes is to avoid having to use syntax likeuser.user_id
in queries to disambiguate your field names - ENUMs are evil, don't use them in new code. It's much easier to define numerical constants in PHP (e.g.
COLLABWATCHLISTUSER_OWNER = 1
) and put those in an integer field in your DB table- Fixed in r99867 User:Flohack
- Why does
collabwatchlistuser
have an auto-incremented primary key field? Can't you just drop that and add a unique index on(rl_id, user_id)
?- The reason for the primary key is that one user can have several roles (rlu_type) within one list.
- Same for
collabwatchlistcategory
andcollabwatchlisttag
- Removed the primary keys in r99867 User:Flohack
- Comment on line 3 of
collabwatchlist_noctid.sql
seems to have been cut off- See related question above, regarding detecting an old schema. User:Flohack
- What is the
collabwatchlistrevisiontag
table for? It's not well documented (nor are any of the other tables, BTW, but those were easier to figure out by just reading the field names and the comments)- Fixed in r99867 User:Flohack
SpecialCollabWatchlist.php
edit- Yeah, the grandparent constructor thing is nasty but it's the way things work with special pages, unfortunately
- This file duplicates lots of code from
phase3/includes/specials/SpecialWatchlist.php
, or to be more precise an older version of that file. I'm not gonna bother reviewing most of this file until that duplication is cleaned up. COUNT(*)
queries are evil because their running time is linear in the number of rows they need to scan, and this number is unbounded (LIMIT
doesn't work the way you expect it to in this case)- There are a few things wrong with the tag filter query
- The logic of how
$invertTags
triggersEXISTS
/NOT EXISTS
seems backwards - Subqueries are not supported by MySQL 4.0, and we still have four 4.0 machines in our DB cluster. I'm not sure whether it would make sense for this subquery to be rewritten to a join, performance-wise, I'd have to ask. You're right that regexing tag_summary is a dead end: it's not just impractical, it's also a performance nightmare
- The query is built with raw SQL, which you shouldn't be doing. Even with a subquery, you can use
$dbr->selectSQLText()
to build your subquery, then inject it into the main query
- The logic of how
- Instead of using
SpecialPage::getTitleFor( 'CollabWatchlist' )
you can just use$this->getTitle()
- Don't use
empty()
- In
wlGetFilterClauseForCollabWatchlistIds
, you're using$rl_ids != 0
, which is a loose comparison between an array and an integer. That's just evil - None of your queries are limited or paged. While this is a bug in the existing watchlist code (no paging so 20,000-page watchlists cause OOMs and timeouts and such), that doesn't mean it's OK to write new code that has the same problem
- Don't use
implode( ... addQuotes( ...
if you've already demonstrated you know how to usemakeList()
$this->addQuotes()
is probably unnecessary if you clean up the previous point, but it's also unnecessary because you can usearray_map( array( $db, 'addQuotes' ), $array )
CollabWatchlistEditor.php
edit- Instead of having a big switch-case statement with nine cases, break them out into separate functions
- Don't use
!isset()
to check for null - Code like
Title::newFromText( $wgCollabWatchlistPermissionDeniedPage )->getLocalURL()
will cause a fatal error if an invalid title string is fed tonewFromText()
. Check the return value ofnewFromText()
for null before calling methods on it - Don't use
create_function()
it's really, really, really evil. Add a static function to your class instead - What's up with redirecting to a wiki page upon a permission error? That's very unconventional. There are built-in facilities like
$wgOut->showErrorPage()
and friends, use them- Throwing a permissions error or redirecting away upon an invalid token is really bad UX: a user who submits their form after their session expires wants to see a descriptive error message (
'sessionfailure'
message in core), plus the form as they filled it out so they can resubmit the form right away. Destroying their form data is unacceptable
- Throwing a permissions error or redirecting away upon an invalid token is really bad UX: a user who submits their form after their session expires wants to see a descriptive error message (
- Assignments inside if conditions such as on lines 72 and 76 are discouraged in the coding conventions
checkToken()
callscheckPermissions()
, but this is not documented and is generally a bad idea- Again, do not use COUNT(*), especially if you only care about whether the count is >= 1. In that case, use LIMIT 1 in your query and check
$dbr->numRows()
explode( "\n", trim( $list ) )
is guaranteed to return an array, you don't have to check foris_array()
afterwards- In
extractCollabWatchlistCategories()
, the variables$text
and$titleText
seem to be used interchangeably in a very confusing way - In
extractCollabWatchlistTags()
you can useexplode( '|', $foo, 2 )
to split a string on a pipe character instead of rolling your own using string manipulation - In
showTitles()
, if you transform a string into a Title, cache that so you don't end up doing it twice- I believe you would need to i18n the parentheses in this function somehow but I'm not sure how that's supposed to be done
showTagList()
does not escape$title
and$description
. This is probably an XSS vulnerability- Four functions in a row whose sole purpose is to execute a COUNT(*) query. Don't use COUNT(*).
- In
getCollabWatchlistCategories()
you don't need to check that there were >0 rows before you foreach over the result. Looping over an empty result is perfectly safe. Likewise, checking for!$res
is unnecessary.select()
will either return a ResultWrapper object or bail out with an exception, you're not gonna get it to return false unless you manually disable DB error exceptions in your code- If your namespace and title come directly from the database, you can use
makeTitle()
instead ofmakeTitleSafe()
- If your namespace and title come directly from the database, you can use
- We have wrapper functions for INSERT SELECT and DELETE. You shouldn't build raw SQL there, or use prepared statements (no one does that in MW, I don't know why we even support that)
- Line 885: do not use
new Article( $title )
, usenew Article( $title, 0 )
instead. The former has nasty side effects if an?oldid=
query parameter is set - Line 1010: timestamps that you're inserting into the database must be run through
$dbw->timestamp()
. To obtain the current timestamp, just call it without any parameters. You cannot insert ISO 8601 timestamps into the database, that is incorrect even for MySQL (which uses TS_MW) - Line 1172, line 1202, line 1232:
$rlName
is not escaped - Line 1210:
$description
is not escaped - Line 1303: improper uses of
isset()
andempty()
on the same line - Line 1317:
$listName
is not escaped
CollabWatchistChangesList.php
edit$gwlSpeciaPageTitle
has a typo in its name$rc->getAttribute()
isn't used consistently. It's used on line 62, but on line 84$rc->mAttribs['foo']
is used insteadinsertRollback()
inserts an undo link? How is that related to the feature set of this extension at all?- Line 213: don't use inline JavaScript, and don't use
window.onLoad
. Use jQuery instead, it's so much nicer - Line 254: don't use onFoo attributes, use jQuery
- In
getCollabWatchlistIdAndName()
: don't mess with$wgDBTablePrefix
here, you really shouldn't need to. If your code breaks if you don't use it, you're doing something very wrong
CategoryTreeManip.php
edit- Line 37: if the cache duration is gonna be a fixed constant, make it a constant like the cache keys
- Line 66, 74, 102, 136: indentation is off. The coding conventions recommend you put single-statement if blocks in braces too
- Line 94: don't use isset to check for null
- Line 208, 290: so you have a variable that doesn't exist the first time you enter the loop, and you create it halfway through the first iteration. This is confusing and easily avoided by just initializing it to something like an empty array before you start the loop
- Line 259: loose comparison between an array and an empty string is not a good idea
- The query in
getChildCategories()
looks wrong, I don't see how it limits itself to child categories rather than category members. With the new DB schema you don't need to join against page and usepage_namespace=14
to get child categories anymore, you can just usecl_type='subcat'
. Also, it seems the cp table is useless in this query, you can take category titles and WHERE them straight intocl_to
- Line 292: this should probably be a strong comparison. Strings behave strangely, with things like
'0' == '00'
returning true - Line 341: you cannot use queries like
SELECT stuff FROM page WHERE page_title IN ('foo', 'bar')
without a condition onpage_namespace
, because:- It will return pages like
Foo
as well asTalk:Foo
,User:Foo
,Category:Foo
etc etc (I'm assuming you only want categories here, so add'page_namespace' => NS_CATEGORY
- It will melt the database because the query is unindexed and the enwiki page table has over 20 million rows
- It will return pages like