Topic on User talk:Anomie/Abstract schema

Skizzerz (talkcontribs)

I'd rather that the data structure splits partial indexes into something more easy for a machine to parse rather than having to rely on special syntax such as "column_name(20)". Perhaps something like the following:

  • Allow tables.tname.indexes.iname.columns to be an array of 1+ "index objects", where an index object is either a string (to represent an index which covers the full column), or an unordered map with the keys "column" (required; value is the string column name), "length" (optional; value is an integer length of the column index), and "direction" (optional; value is either the string "asc" or "desc"). Example:
"columns": [
    "column_1",
    { "column": "column_2", "length": 20 },
    { "column": "column_3", "direction": "desc" }
]

This correlates to an index on column_1, column_2(20), column_3 DESC. We can add other keys to the index object should the need arise, or we can skip the standard string form and require that it be an array of objects in order to make parsing and schema validation easier.

Notwithstanding the representation, "index the full column" may be a non-starter in MSSQL, as indexes can be a max of 900 bytes long. If the full column is larger, MSSQL will need to find some other way of indexing it. I'm still running tests, but I believe that if you create a computed column (based on the prefix length you care about), and the index that computed column instead of the main column, the query optimizer will select your index should the query be able to make use of it. I'm not 100% sure on this though, and need to perform additional testing to verify. It may also be a feature in the optimizer gated by SKU (as in, it's available in Enterprise but not Standard or Express).

Anomie (talkcontribs)

Yeah, you're probably right. I'd rather keep the string shorthand since that's the most common case.

MySQL has a similar limit, although I don't know offhand the exact value. That's why indexes on "text" and "blob" type columns require the length. I'd like to prohibit them entirely, but MediaWiki's MySQL schema does have indexes on a few tinyblob columns: ipblocks.ipb_address, ipblocks.ipb_range_start, and ipblocks.ipb_range_end.

Anomie (talkcontribs)

Actually, since it's only those few tinyblob columns, let's go ahead and disallow non-fulltext indexes on "text" and "blob" columns when size is not tiny.

Reply to "Partial indexes"