One of the variables that can have a major impact on your MongoDB performance is the way in which you implement indexes. While it's virtually always a good idea to use indexes in some form, you need to apply some analysis and tuning in order to make sure you've set them up to function as effectively as possible.
When you run a query against MongoDB, your server reads from your specified collection and returns documents that match the criteria you've designated. As the query executes, MongoDB will check to see if you've established an index for your documents. If you have, then it will default to using that index to find matches quickly; if you haven't, then it has no choice but to scan your entire collection for matching documents.
You can think about this as analogous to using an index when looking something up in an encyclopedia: if the book has an index and the entries are alphabetical, it's easy for you to find the appropriate page number and the entry you're looking for; if that kind of organization were missing, you'd have to read the entire encyclopedia cover-to-cover in order to find each appearance of your specified term. Hardly optimal or time-efficient.
One of the central tenets of good MongoDB performance is that full collection scans are bad—really bad. They're slow and unfocused, forcing your system to sift through a much larger collection of data than necessary, wasting system time and resources. If your MongoDB instance isn't using an index, then querying a large data set will be prohibitively slow. There are several key advantages to using indexes:
- Fast access.
- Ordered results—you don't need to sort data if it's already indexed.
- A smaller working set—you won't need to keep your whole collection in memory if you're only accessing part of it at a given time (read more about correctly sizing your working sets in our previous blog post).
Once you recognize how indexes can significantly improve your Mongo performance, the next step is to make sure that you're indexing your collections to match query workload. Just setting up indexes isn't enough: you should also be able to monitor exactly how they're impacting your system. While using correctly implemented indexes is far superior than running full collection scans, indexes first need to be tuned correctly, and, as we explain below, there are even situations when an overload of indexes can have a negative performance impact. Once you understand how to evaluate your indexes, you may find that they need some work.
Tools for evaluating your index efficiency
The basic goal of using indexes is to decrease volume of data your system must scan in order to find the specific data you're looking for. The exact speed at which this should happen is fluid, of course, and depends somewhat on the details of your system, but there are some benchmarks and ratios that you can look at to gauge whether the scans are occurring at an acceptable efficiency. As you do these evaluations, there are several tools available to you—db.serverStatus() has some counters that point to poor MongoDB indexing.
metrics.operation.scanAndOrder will tell you "the total number of queries that return sorted numbers that cannot perform the sort operation using an index." A high value indicates that the server is performing many manual sort operations, rather than relying on pre-sorted data from an index. This extra work puts unnecessary strain on the system, and if you have a lot of queries that read sorted output, this is a red-flag.
metrics.queryExecutor.scanned and metrics.queryExecutor.scannedObjects show the number of index entries and collection entries scanned by the database over time. With this information, you can take the ratio of scannedObjects:scanned and determine how efficiently your queries are using indexes.
Finally, scannedObjects/metrics.document.returned provides the average "number of documents read from collections" per "document returned." In this case, similar to the others, if you see that your system is reading far more documents than it's returning, it indicates that your indexes aren't optimal and not directing your system's work effectively.
So, should I put indexes on all of my document files?
No—it is possible to have too many indexes. (Everything in moderation!)
Ultimately, even as indexes are an indispensable way to get around unacceptably slow full collection scans, they still need proper maintenance and to be set up smartly, so they benefit your system—not add further complications. If you add too many indexes to a collection, then the overhead of maintaining them can easily outweigh the benefits, by making insert and update operations much slower.
Like anything else in your system, indexes take up space, even as they're designed to help the system function more efficiently. If you have too many indexes on a collection, they'll inflate your working set unnecessarily.
If you find yourself in this situation, you can free up space with partial indexes, which add indexes on a subset of a collection that is frequently read.
As with most efforts to maximize system performance, one of the first steps toward making indexes work well in MongoDB is understanding how you should define good vs. bad performance. Knowing what to expect and which metrics to value are crucial to tuning the system to behave as efficiently as possible.
If you'd like to learn more about VividCortex and our solutions for MongoDB, be sure to stop by our booth and meet up with members of our team at MongoDB World later this month. You can also learn more about all the 2017 events VividCortex will be attending here.