User:Robchurch/Interwiki existence checks
Copied from IRC:
<robchurch> Hmm. <robchurch> One horrible problem with interwiki links is, as mentioned, the resolution problem. <robchurch> w:en:Foo and en:w:Foo are the same but work differently. <robchurch> It would be rather nice to have a consistent mechanism to convert w:en:Foo into http://en.wikipedia.org/wiki/Foo in a single pass without reliance upon interwiki redirection. :) <robchurch> We could allow a specific format in interwiki.iw_url, e.g. http://$lang.wikipedia.org for 'w' <robchurch> [[en:w:Foo]] gets split up, "en" would be detected as a language code and replaced into it. <robchurch> For something like [[w:Foo]], since there's no language code, use the content language code. <robchurch> The biggest problem I envision with that is making sure it's backwards-compatible. <robchurch> The actual existence checking is quite simple; we can have something like LinkCache but for interwiki links. <robchurch> This could have a special batch job like LinkBatch and we could introduce a simple API method to do batch existence lookups. <robchurch> On wiki farms, this cache could be shared across the entire cluster. <robchurch> In that particular case, updating the cache is rather straightforward, since each wiki can maintain its own entries. <robchurch> For non-farm setups, what we could potentially do is introduce a special kind of link table which stores a URL as the "from" value - when doing a cache update, do some sort of specific XML callbackesque thing to the wiki that requested existence state in the first place. <robchurch> Later on, if we wanted, we could extend and override bits for a custom implementation for Wikimedia using direct access to the databases to make it that bit faster.