Inspecting Index Usage In MySQL, PostgreSQL, and MongoDB

Posted by Baron Schwartz on Feb 8, 2019 2:45:23 PM

In my recent post about the Left-Prefix Index Rule, I explained how queries can use all or part of a compound (multi-column) index. Knowing what makes an index fully usable by a query is important, but it's also important to know how to figure out how much of an index a query is able to use. In this article I'll show you how to do this by reading the query's explain plan. This article covers MySQL, Postgres, and MongoDB.

 

denny-muller-1260091-unsplash

 

As a quick review, a query can use an index if it has filtering values that constrain a contiguous leading portion of the index, up to and including the first inequality condition in index-column order. Now let's see where the database server exposes how much of the index is used.

MySQL

In MySQL's EXPLAIN output, the key_len column shows how much of the index was considered usable. It's a number of bytes, not a number (or list) of columns. You have to look at the columns' data types to figure out how many columns this maps to. It can get tricky, but at a high level, imagine that MySQL stores indexes as a concatenation of columns' maximum possible lengths, including an extra byte if the column is nullable.

For example:

CREATE TABLE `actor` (
`actor_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`first_name` varchar(45) NOT NULL,
`last_name` varchar(45) NOT NULL,
`last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`actor_id`),
KEY `idx_actor_last_name` (`actor_id`, `last_name`)
) ENGINE=InnoDB AUTO_INCREMENT=201 DEFAULT CHARSET=utf8

The idx_actor_last_name index is going to be up to 2 + 45 bytes long (a smallint is 2 bytes).

Now if you see key_len of 2 in EXPLAIN, that means the query is only able to constrain the actor_id column, and there's no value to match against the last_name column. The query is only using a 1-column prefix of the index.

If you're using the JSON format for EXPLAIN, look for the used_key_parts property, which will have a list of columns, like "used_key_parts": ["actor_id"].

PostgreSQL

PostgreSQL shows which values were used to search or filter an index with the "Index Cond" in the EXPLAIN output. Values that are used to filter rows from the table, are shown as filters. This example query from the EXPLAIN documentation shows both at once:

EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000 LIMIT 2;

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.29..14.71 rows=2 width=244) (actual time=0.177..0.249 rows=2 loops=1)
-> Index Scan using tenk1_unique2 on tenk1 (cost=0.29..72.42 rows=10 width=244) (actual time=0.174..0.244 rows=2 loops=1)
Index Cond: (unique2 > 9000)
Filter: (unique1 < 100)
Rows Removed by Filter: 287

The "Index Cond" on unique2 near the bottom shows a value being used to search for rows in the index, whereas the filter on unique1 is applied to the rows in the table.

Postgres doesn't make it explicitly visible in the index whether a column is "useful" in searching for values in the index. You might see complex Index Cond's like "Index Cond: ((i2 = 898732) AND (i5 = 123451))" but you'll need to know whether these two columns comprise a contiguous leading prefix of the index. The left-prefix rule still applies, but it's not as explicitly visible. However, as the multi-column index docs say (emphasis mine),

A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns... Constraints on columns to the right of these columns are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned.

MongoDB

MongoDB has the same rules about using a prefix of a compound index. It exposes which columns were matched in the keyPattern property of the explain() output. Each column that was used has a "colname":1 indicator within this. Here's an example, which the docs dissect in more detail:

{
"queryPlanner" : {
...
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"quantity" : 1,
"type" : 1
},
...

Conclusions

When you combine this blog post together with the Left-Prefix Index Rule you have a complete way of examining whether queries use the full width of an index effectively. Are they using values to search to as small a range of the index as they could be? Or are they scanning a lot of an index and filtering a lot of it out? The former is a lot more efficient, so the database can make your queries run a lot faster that way.

Photo by Denny Müller

Recent Posts

Posts by Topic

see all