User:Catrope/Extension review/ShortUrl

Base revision: r95304

General notes edit

  • You're adding a table where you permanently map an (ns,title) pair to an ID. Is this because you want the ID to keep referring to the same title through moves etc, unlike page IDs? I guess my question is: is there a specific reason why you're not using page IDs?
    • Another thing to consider may be deletion of a page and re-creation of the same page at a later time. As well as disambiguation/split of page, where keeping the page_id would make an assumption of which definition would be intended (instead of the title, which would then have become a disambigpage) Krinkle 00:05, 29 September 2011 (UTC)
  • There's a layout bug in IE7, see this screenshot

ShortUrl.php edit

Done in r103035
  • 'dependencies' => array( 'jquery' ) is unnecessary. jQuery is guaranteed to be present
  • Configuration variables (you have only one in this case) should be near the top of the file, clearly marked as such, have a documentation comment. Since the default is null, you should explain what that means too

ShortUrl.i18n.php edit

Done in r103035
  • Don't Use Title Case (Or Otherwise Use Too Many Capitals), The People At TranslateWiki Don't Like That. Specifically, (No Short should not be capitalized

ShortUrl.hooks.php edit

Done in r103035
  • Line 24: don't do a loose comparison with null, do a strict comparison instead
  • Line 25: you need to use getCanonicalUrl() here instead of getFullUrl() so things will work properly on wikis with protocol-relative URLs. You couldn't have known this because getCanonicalUrl() and most of the other protocol-relative URL support code didn't even exist yet when you wrote this code

ShortUrl.utils.php edit

Done in r103035
  • Line 28: you probably want getPrefixedText() rather than getFullText(). The latter will return different values for 'Foo' and 'Foo#Bar', even though they'll map to the same su_id and the same base36 thing.
  • Line 32: there's a convenient function called $dbr->selectRow() that returns a single row object or false. It's specifically intended for queries like these where you only want one row. It adds LIMIT 1 to the query but that's not a problem here
  • decodeURL() does not account for the possibility that the short URL ID does not occur in the database. In that case it returns an invalid (!!!) Title object with an empty string as the title. There is a check in SpecialShortUrl::execute() that shows an error page if decodeURL() returns false or null or something, but that code is never reached because it always returns an object. Instead, the code will try to redirect to a page with an empty name, and it so happens that that redirects you to the main page.
    • The way the error message is done looks wrong anyway. $wgOut->showErrorPage() should be used for that

shorturls.sql edit

Done in r103035

ext.shortUrl.js edit

Done in r103035
  • Line 4: url is not escaped here. Use something like $( '<a>' ).addClass( 'foo' ).attr( 'href', url ).text( url ) so things are escaped properly

ext.shortUrl.css edit

Done in r103035
  • display: Block; should not be capitalized
  • Applying display: block to the link has a weird side effect where the clickable area extends all the way to the right, see this screenshot

populateSortUrlTable.php edit

Done in r103665.
  • Line 29: you can't just select all rows from the page table in one query. Here's why:
mysql> SELECT COUNT(*) FROM page;
+----------+
| COUNT(*) |
+----------+
| 24849480 |
+----------+
1 row in set (2 min 38.66 sec)
  • Using $insertBuffer to insert rows in batches of 100 is a good idea. However, you should also
    • select stuff from the page table in 100-row batches too. You can do that with something like SELECT stuff FROM page WHERE page_id > $previousPageID ORDER BY page_id LIMIT 100
    • print progress markers for each batch rather than for each row, so you get something like 100 titles done \n 200 titles done \n 223 titles done instead of 1 titles done \n 2 titles done \n 3 titles done
    • increment $rowCount before printing, not after. Currently it starts at zero and finishes one too low
    • call wfWaitForSlaves() after inserting each batch to make sure replication lag doesn't go through the roof while you populate your 25 million-row table
    • print "done" at the end so the user knows the script finished properly rather than dying in the middle or something