User:Catrope/SQL talk
Indexes (AKA keys)
editmysql> SHOW CREATE TABLE page; [snip] | page | CREATE TABLE `page` ( `page_id` int(8) unsigned NOT NULL AUTO_INCREMENT, `page_namespace` int(11) NOT NULL DEFAULT '0', `page_title` varbinary(255) NOT NULL DEFAULT '', `page_restrictions` tinyblob NOT NULL, `page_counter` bigint(20) unsigned NOT NULL DEFAULT '0', `page_is_redirect` tinyint(1) unsigned NOT NULL DEFAULT '0', `page_is_new` tinyint(1) unsigned NOT NULL DEFAULT '0', `page_random` double unsigned NOT NULL DEFAULT '0', `page_touched` varbinary(14) NOT NULL DEFAULT '', `page_latest` int(8) unsigned NOT NULL DEFAULT '0', `page_len` int(8) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`page_id`), UNIQUE KEY `name_title` (`page_namespace`,`page_title`), KEY `page_random` (`page_random`), KEY `page_len` (`page_len`), KEY `page_redirect_namespace_len` (`page_is_redirect`,`page_namespace`,`page_len`) ) ENGINE=InnoDB AUTO_INCREMENT=35610364 DEFAULT CHARSET=binary |
Index on a single field
editORDER BY without index
editmysql> EXPLAIN SELECT * FROM page ORDER BY page_touched LIMIT 10; +----+-------------+-------+------+---------------+------+---------+------+----------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+----------+----------------+ | 1 | SIMPLE | page | ALL | NULL | NULL | NULL | NULL | 26647370 | Using filesort | +----+-------------+-------+------+---------------+------+---------+------+----------+----------------+ 1 row in set (0.00 sec)
Bad signs:
- type=ALL: full table scan
- key=NULL: no index used
- Using filesort
Query execution:
- Retrieve all 26M rows
- Sort them by page_touched
- Return first 10
ORDER BY with index
editmysql> EXPLAIN SELECT * FROM page ORDER BY page_len LIMIT 10; +----+-------------+-------+-------+---------------+----------+---------+------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+----------+---------+------+------+-------+ | 1 | SIMPLE | page | index | NULL | page_len | 4 | NULL | 10 | | +----+-------------+-------+-------+---------------+----------+---------+------+------+-------+ 1 row in set (0.00 sec)
Filesort is gone, type=index, key=page_len, and rows=10
Query execution:
- Look at page_len index (pre-sorted list of row pointers ordered by page_len)
- Grab 10 row pointers off the top
- Retrieve those rows and return them
WHERE without index
editmysql> EXPLAIN SELECT * FROM page WHERE page_is_new=0 LIMIT 1; +----+-------------+-------+------+---------------+------+---------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+----------+-------------+ | 1 | SIMPLE | page | ALL | NULL | NULL | NULL | NULL | 26647485 | Using where | +----+-------------+-------+------+---------------+------+---------+------+----------+-------------+ 1 row in set (0.00 sec)
Query execution:
- Iterate over all rows, in any order
- For each row:
- If page_namespace=0, return row and stop
- If not, continue to next row
- If no rows match, return empty result
Worst case: no rows match, entire table is scanned
WHERE with index
editmysql> EXPLAIN SELECT * FROM page WHERE page_id=12345 LIMIT 1; +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+ | 1 | SIMPLE | page | const | PRIMARY | PRIMARY | 4 | const | 1 | | +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+ 1 row in set (0.00 sec)
Seeing const is usually good
Query execution:
- Use binary search to find the row with page_id=12345
- This scans at most 1+log(26M) = 27 rows
- If found, return the row, otherwise return an empty result
WHERE .. LIMIT N
editmysql> EXPLAIN SELECT * FROM page WHERE page_id >= 12345 ORDER BY page_id LIMIT 10; +----+-------------+-------+-------+---------------+---------+---------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+----------+-------------+ | 1 | SIMPLE | page | range | PRIMARY | PRIMARY | 4 | NULL | 13323844 | Using where | +----+-------------+-------+-------+---------------+---------+---------+------+----------+-------------+ 1 row in set (0.00 sec)
This is a bug in MySQL: rows=13M is wrong. Note type=range
Query execution:
- Use binary search to find the row with page_id=12345 (or the first one that's higher)
- Return that row plus the next 9
Examples of indexed and unindexed queries
editAssuming there is an index on foo:
- Good: WHERE foo = 'abc'
- Bad: WHERE foo != 'abc'
- Good: WHERE foo > 'abc' ORDER BY foo (or >= , <= , < , BETWEEN)
- Bad: WHERE foo > 'abc' ORDER BY bar
- Good: WHERE foo LIKE 'abc%'
- Bad: WHERE foo LIKE '%abc' (or anything else that isn't a prefix match)
- Good: WHERE foo LIKE 'abc%' ORDER BY foo
- Bad: WHERE foo LIKE 'abc%' ORDER BY bar
mysql> EXPLAIN SELECT * FROM image WHERE img_name='Flag_of_Canada.svg'; +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+ | 1 | SIMPLE | image | const | PRIMARY | PRIMARY | 257 | const | 1 | | +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT * FROM image WHERE img_name != 'Flag_of_Canada.svg' +----+-------------+-------+-------+---------------+---------+---------+------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+--------+-------------+ | 1 | SIMPLE | image | range | PRIMARY | PRIMARY | 257 | NULL | 814940 | Using where | +----+-------------+-------+-------+---------------+---------+---------+------+--------+-------------+ 1 row in set (0.00 sec) -- rows=400K is wrong mysql> EXPLAIN SELECT * FROM image WHERE img_name > 'Flag_of_Canada.svg' ORDER BY img_name LIMIT 10; +----+-------------+-------+-------+---------------+---------+---------+------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+--------+-------------+ | 1 | SIMPLE | image | range | PRIMARY | PRIMARY | 257 | NULL | 407464 | Using where | +----+-------------+-------+-------+---------------+---------+---------+------+--------+-------------+ 1 row in set (0.00 sec) -- You have to explicitly trigger a worst case here (img_name > 'z' is rare, img_name > 'A' is common) mysql> EXPLAIN SELECT * FROM image WHERE img_name > 'z' ORDER BY img_sha1 LIMIT 10; +----+-------------+-------+-------+---------------+---------+---------+------+------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+------+-----------------------------+ | 1 | SIMPLE | image | range | PRIMARY | PRIMARY | 257 | NULL | 1150 | Using where; Using filesort | +----+-------------+-------+-------+---------------+---------+---------+------+------+-----------------------------+ 1 row in set (0.00 sec) -- rows=1144 is wrong mysql> EXPLAIN SELECT * FROM image WHERE img_name LIKE 'Flag_of_%' ORDER BY img_name LIMIT 10; +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ | 1 | SIMPLE | image | range | PRIMARY | PRIMARY | 257 | NULL | 1144 | Using where | +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT * FROM image WHERE img_name LIKE '%Canada.svg' ORDER BY img_name LIMIT 10; +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ | 1 | SIMPLE | image | index | NULL | PRIMARY | 257 | NULL | 10 | Using where | +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT * FROM image WHERE img_name LIKE 'Flag%' ORDER BY img_sha1 LIMIT 10; +----+-------------+-------+-------+---------------+---------+---------+------+------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+------+-----------------------------+ | 1 | SIMPLE | image | range | PRIMARY | PRIMARY | 257 | NULL | 1144 | Using where; Using filesort | +----+-------------+-------+-------+---------------+---------+---------+------+------+-----------------------------+ 1 row in set (0.00 sec)
Index on two fields
editThis works just like a phone book:
CREATE INDEX name ON phonebook (lastname, firstname);
Note that the order matters. We're sorting by lastname first, then by firstname:
lastname | firstname | ||
---|---|---|---|
Roberts | Daniel | ||
Roberts | John | ||
Robertson | Adam | ||
Robertson | John | ... | ... |
Examples of indexed and unindexed queries
edit- Good: ORDER BY lastname, firstname
- Bad: ORDER BY firstname, lastname
- Good: ORDER BY lastname DESC, firstname DESC
- Bad: ORDER BY lastname ASC, firstname DESC
- Good: WHERE lastname = 'Roberts' AND firstname = 'Daniel'
- Bad: WHERE lastname = 'Roberts' OR firstname = 'Adam'
- Good: WHERE lastname = 'Roberts' ORDER BY firstname
- Bad: WHERE firstname = 'John' ORDER BY lastname
- Good: WHERE lastname = 'Roberts' AND firstname > 'Daniel' ORDER BY firstname (or >= , <= , < , BETWEEN)
- Bad: WHERE firstname = 'John' AND lastname > 'Roberts' ORDER BY lastname
- Good: WHERE lastname = 'Roberts' AND firstname LIKE 'D%' ORDER BY firstname
- Bad: WHERE firstname = 'John' AND lastname LIKE 'R%' ORDER BY lastname
mysql> EXPLAIN SELECT * FROM page ORDER BY page_namespace, page_title LIMIT 10; +----+-------------+-------+-------+---------------+------------+---------+------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------------+---------+------+------+-------+ | 1 | SIMPLE | page | index | NULL | name_title | 261 | NULL | 10 | | +----+-------------+-------+-------+---------------+------------+---------+------+------+-------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT * FROM page ORDER BY page_title, page_namespace LIMIT 10; +----+-------------+-------+------+---------------+------+---------+------+----------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+----------+----------------+ | 1 | SIMPLE | page | ALL | NULL | NULL | NULL | NULL | 26647899 | Using filesort | +----+-------------+-------+------+---------------+------+---------+------+----------+----------------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT * FROM page ORDER BY page_namespace DESC, page_title DESC LIMIT 10; +----+-------------+-------+-------+---------------+------------+---------+------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------------+---------+------+------+-------+ | 1 | SIMPLE | page | index | NULL | name_title | 261 | NULL | 10 | | +----+-------------+-------+-------+---------------+------------+---------+------+------+-------+ 1 row in set (0.00 sec) -- Note that ASC is the default, so this is really namespace ASC, title DESC mysql> EXPLAIN SELECT * FROM page ORDER BY page_namespace, page_title DESC LIMIT 10; +----+-------------+-------+------+---------------+------+---------+------+----------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+----------+----------------+ | 1 | SIMPLE | page | ALL | NULL | NULL | NULL | NULL | 26647905 | Using filesort | +----+-------------+-------+------+---------------+------+---------+------+----------+----------------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT * FROM page WHERE page_namespace=0 AND page_title='Wikimania' LIMIT 10; +----+-------------+-------+-------+---------------+------------+---------+-------------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------------+---------+-------------+------+-------+ | 1 | SIMPLE | page | const | name_title | name_title | 261 | const,const | 1 | | +----+-------------+-------+-------+---------------+------------+---------+-------------+------+-------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT * FROM page WHERE page_namespace=0 OR page_title='Wikimania' LIMIT 10; +----+-------------+-------+------+---------------+------+---------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+----------+-------------+ | 1 | SIMPLE | page | ALL | name_title | NULL | NULL | NULL | 26647913 | Using where | +----+-------------+-------+------+---------------+------+---------+------+----------+-------------+ 1 row in set (0.00 sec) -- rows is wrong, this one is quite efficient mysql> EXPLAIN SELECT * FROM page WHERE page_namespace=2 ORDER BY page_title LIMIT 10; +----+-------------+-------+------+---------------+------------+---------+-------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------------+---------+-------+---------+-------------+ | 1 | SIMPLE | page | ref | name_title | name_title | 4 | const | 4506840 | Using where | +----+-------------+-------+------+---------------+------------+---------+-------+---------+-------------+ 1 row in set (0.00 sec) -- rows is wrong, this one is inefficient. The index is still used to sort by page_namespace though mysql> EXPLAIN SELECT * FROM page WHERE page_title='Catrope' ORDER BY page_namespace LIMIT 10; +----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+ | 1 | SIMPLE | page | index | NULL | name_title | 261 | NULL | 10 | Using where | +----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+ 1 row in set (0.00 sec) -- rows is wrong, this one is actually quite efficient mysql> EXPLAIN SELECT * FROM page WHERE page_namespace = 0 AND page_title >= 'Wikimania' ORDER BY page_title LIMIT 10; +----+-------------+-------+-------+---------------+------------+---------+------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------------+---------+------+--------+-------------+ | 1 | SIMPLE | page | range | name_title | name_title | 261 | NULL | 681676 | Using where | +----+-------------+-------+-------+---------------+------------+---------+------+--------+-------------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT * FROM page WHERE page_title = 'Wikimania' AND page_namespace >= 1 ORDER BY page_namespace LIMIT 10; +----+-------------+-------+-------+---------------+------------+---------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------------+---------+------+----------+-------------+ | 1 | SIMPLE | page | range | name_title | name_title | 261 | NULL | 13323968 | Using where | +----+-------------+-------+-------+---------------+------------+---------+------+----------+-------------+ 1 row in set (0.00 sec) -- rows is wrong, this one is actually quite efficient mysql> EXPLAIN SELECT * FROM page WHERE page_namespace = 0 AND page_title LIKE 'W%' ORDER BY page_title LIMIT 10; +----+-------------+-------+-------+---------------+------------+---------+------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------------+---------+------+--------+-------------+ | 1 | SIMPLE | page | range | name_title | name_title | 261 | NULL | 610080 | Using where | +----+-------------+-------+-------+---------------+------------+---------+------+--------+-------------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT * FROM image WHERE img_user_text LIKE 'J%' ORDER BY img_timestamp; +----+-------------+-------+-------+------------------------+------------------------+---------+------+--------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+------------------------+------------------------+---------+------+--------+-----------------------------+ | 1 | SIMPLE | image | range | img_usertext_timestamp | img_usertext_timestamp | 257 | NULL | 112420 | Using where; Using filesort | +----+-------------+-------+-------+------------------------+------------------------+---------+------+--------+-----------------------------+ 1 row in set (0.01 sec)
Things you shouldn't do
edit- Run queries that return a potentially unlimited number of rows
- Retrieving millions of rows from the DB is slow
- Sending them from the DB to PHP is slow
- Processing them in PHP is slow and can lead to memory exhaustion
- Sending the huge HTML result to the client is slow
- So always use LIMIT
- Run queries that scan a potentially unlimited number of rows
- Like full table scans, and some of the stuff below
- Run unindexed queries :)
- Unindexed WHERE on a condition that is rarely false (e.g. WHERE rev_deleted=0) is sometimes fine
- Filesort is never OK. ORDER BY must always be indexed
- ORDER BY an expression is unindexed and therefore bad
- Use OFFSET (or LIMIT n, m which is equivalent to LIMIT m OFFSET n)
- Doesn't scale: LIMIT 10 OFFSET 5000 scans 5010 rows
- Instead, use something like WHERE foo_id>=12345 for paging (be sure to use a unique index for this!)
- Use COUNT(), SUM(), etc., they scan a potentially unlimited number of rows
- MAX(indexed_field) with no WHERE is fast though
mysql> EXPLAIN SELECT * FROM revision WHERE rev_user_text='Catrope' AND rev_deleted=0 ORDER BY rev_timestamp LIMIT 10; +----+-------------+----------+------+--------------------+--------------------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+--------------------+--------------------+---------+-------+------+-------------+ | 1 | SIMPLE | revision | ref | usertext_timestamp | usertext_timestamp | 257 | const | 1044 | Using where | +----+-------------+----------+------+--------------------+--------------------+---------+-------+------+-------------+ 1 row in set (0.01 sec) mysql> EXPLAIN SELECT * FROM image ORDER BY img_width/img_height LIMIT 10; +----+-------------+-------+------+---------------+------+---------+------+--------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+--------+----------------+ | 1 | SIMPLE | image | ALL | NULL | NULL | NULL | NULL | 814945 | Using filesort | +----+-------------+-------+------+---------------+------+---------+------+--------+----------------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT * FROM image ORDER BY img_name LIMIT 10 OFFSET 5000; +----+-------------+-------+-------+---------------+---------+---------+------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+------+-------+ | 1 | SIMPLE | image | index | NULL | PRIMARY | 257 | NULL | 5010 | | +----+-------------+-------+-------+---------------+---------+---------+------+------+-------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT COUNT(*) FROM image WHERE img_name LIKE 'F%'; +----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+ | 1 | SIMPLE | image | range | PRIMARY | PRIMARY | 257 | NULL | 71364 | Using where; Using index | +----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT AVG(img_size) FROM image; +----+-------------+-------+-------+---------------+----------+---------+------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+----------+---------+------+--------+-------------+ | 1 | SIMPLE | image | index | NULL | img_size | 4 | NULL | 814946 | Using index | +----+-------------+-------+-------+---------------+----------+---------+------+--------+-------------+ 1 row in set (0.04 sec) -- Special case: img_size is indexed, so the row with the highest img_size in the entire table is easy to find mysql> EXPLAIN SELECT MAX(img_size) FROM image; +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT MAX(img_size) FROM image WHERE img_name LIKE 'F%'; +----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+ | 1 | SIMPLE | image | range | PRIMARY | PRIMARY | 257 | NULL | 71364 | Using where | +----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+ 1 row in set (0.00 sec)