March 2017 | Blog of Ken W. Alger

I have been using MongoDB for a few years now for various personal projects and I really enjoy the features and flexibility it offers me as a developer. I have taken several great classes on, and from, MongoDB in that time frame as well. As with any technology not only is there a lot to learn in general. There is also a lot to continue to learn to stay up to date and relevant. One of the subjects taught is about MongoDB’s availability and ability to scale both vertically (more RAM, IO capacity, and CPU availability) as well as horizontally (sharding). Another subject that is heavily taught is the idea of clusters and replica sets.

Learning Options

Now I am just a single developer trying to learn technologies, increase my skills, and develop best practices when using them. As such I personally do not have the resources to purchase multiple servers. And constantly increase their hardware, or have them distributed across the country to understand latency issues. I can set up a replica set on the same laptop on which I write my blogs, to get a basic understanding of what is happening but I think a few things get lost in the learning process when boiling things down to a single Intel processor and no network connections between servers.

What options does one have to learn about and use some of these “big boy” features of MongoDB? Or, to prototype applications using real world settings? MongoDB launched a database-as-a-service (DBaaS) platform called Atlas in 2016 and is an opportunity to utilize many of the “big boy” features without having to personally fund a server farm in my garage. I can’t tell you how excited I was about this.

Now I could start clusters in different configurations. Then set up and use multiple replica sets and perform on-the-fly changes to my test servers with ease. I still had to pay a usage fee though. Therefore, I would set up servers, run a few tests, then tear down the servers to avoid any long-lasting fees. As such, I was able to get practice with the technology. It still posed a challenge to showcase some of my projects connecting to a live database.

Atlas M0 Tier

Enter March 2017… MongoDB announced an awesome feature and offering within their Atlas product. A free cluster tier called M0. This allows anyone and everyone who has been interested in using Atlas to utilize MongoDB in the cloud. Further, it is in a clustered environment without even having to give them a credit card. Talk about a great offering for people wanting to use this exciting technology.

Now that we can have access to Atlas and a free cluster, what can we do with Atlas? What’s all the excitement behind it? Well, first off it provides an incredibly intuitive GUI to set up a MongoDB cluster in a variety of regions and with a variety of features. As one might expect the free M0 cluster doesn’t allow for all of the server size features that are available. However, for prototyping it is great. And when your application is ready to move up to a cluster size beyond M0, the GUI is there to make the transition rather pain-free.

Atlas also provides for us, as users that care about our data and not just an easy interface, many built in features. Such as security, replication, backups, automated upgrades, and fine-grained monitoring, to name a few. It allows for many of the painful tasks of database maintenance to be handled by MongoDB. And have the ability to provide alerts when things need more personal attention.

Wrap Up

Of course, there are lots of reasons why people don’t like the idea of storing their data in the cloud, outside their control, on someone else’s hardware. I’m not going to step into that argument. People seem to have strong opinions on the matter in both directions. However, if you are thinking of trying a cloud implementation of MongoDB, I think Atlas is definitely worth a look.

There are several MongoDB specific terms in this post. I created a MongoDB Dictionary skill for the Amazon Echo line of products. Check it out and you can say “Alexa, ask MongoDB for the definition of sharding?” and get a helpful response.

Follow me on Twitter @kenwalger to get the latest updates on my postings.

I get asked about and see a lot of posts and comments on the internet about MongoDB not being as quick on query reads as people think it should be. These questions and/or comments are often followed by a panning of MongoDB itself. Often based on the user’s experience in this one situation. My first question in these situations typically is “What indexes are set up on your collection that relate to your queries?” More often than not I get a deer in headlights look back at me. After some stammering, the answer typically is “I don’t know.”, “Is indexing important?”, “Whatever is standard.”, or the most popular, “What’s an index?”.

Indexing Overview

In this blog post, I’d like to touch briefly on what indexes are in MongoDB and how they greatly impact performance. What is an index? If we start with the definition provided by MongoDB:

Indexes are special data structures that store a small portion of the collection’s data set in an easy to traverse form.

we get an idea from the “easy to traverse” statement that they make something that is complicated, easier. In this case, indexes make traversing a collection easier (faster).

Let’s consider a data set that includes all of the postal codes in the United States. (zips.json can be downloaded here). Without an appropriate index if our application wants to find, for example, the zip code for a particular city, let’s say Keizer, Oregon (97303), MongoDB would have to scan our entire collection for that city to return the appropriate zip code. In fact, based on our data set, it would have to look through all 29,467 records to be able to find and return the one record.

That’s a lot of unnecessary looking through the database to try to find the correct match of our search term. Imagine if our data set was much larger and included a million or more records. That would be a lot of overhead and searching. If we look at what is going on in a basic query for looking for our city of “KEIZER” buy having MongoDB explain the execution stats for our query, db.zips.find({"city": "KEIZER"}).explain("executionStats"), we can see a few things that are performance bottlenecks.

First, we see that even in our relatively small database the query execution time was 34ms. Then, as expected, we looked at all 29,467 documents and that a collection scan was performed to do this query. Again imagine scanning a much larger data set and how that could be a slow process.

Now, what happens if we add an index? Since we are, in this case, searching by city name, it would make sense to create an index on that field. That can be accomplished in the Mongo Shell with the command:

db.zips.createIndex({"city": 1})

Which will create an ascending index on the city field in our collection. Now if we run the same query as before we should expect a couple of things. First, our query execution time should be significantly lower as well as the documents examined number.

Index Used — Find with a defined and used Index

Wow, with an index in place on the city field, doing a search on a city we get some amazing improvements. Our Actual Query Execution Time went from 34ms to zero, we are doing an index scan now (IXSCAN) instead of a collection scan (COLLSCAN) and the number of documents examined decreased to only having to examine a single document. That’s pretty powerful and highlights the need to have indexes on your collection.

After explaining this to MongoDB users I often get a “Why don’t I just index every field then?” response. Well, there’s no such thing as a free lunch, right? Indexes come with overhead. Some examples include memory usage and write performance of your data due to having to update indexes based on new data being stored.

We could also create indexes on multiple fields as well. We might, for example, not only be querying our database on a single city but on a city and state combination. In that case, we might want to look at generating a compound index that references multiple fields in the same index. In this example something like db.zips.createIndex({"city": 1, "state": 1}) might be useful.

Wrap Up

When deciding on an index to create there are a few common things to think about in general. First, create indexes which support your queries. If you are not going to query the zip code collection based on population (“pop”), there isn’t a need to generate an index for that field. Second, if your queries are doing sort operations, make sure that your indexes are supporting those in an efficient way. Third, make sure that your queries are allowing MongoDB to be selective in the results it provides from the query. This allows MongoDB to use the index for the majority of the work.

Indexes are an important part of proper application design with MongoDB. Having a properly designed index can have a large positive impact on the performance of your application. I would highly recommend reading more about them prior to your application deployment to ensure a great end user experience.

Help Support the Site

If you enjoy reading my blog and would like to help offset the costs of hosting and maintenance, I would be honored to accept your donations.

donate monthly
donate once only

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Month: March 2017

Indexing in MongoDB

Indexing Overview

Wrap Up