Indexing in MongoDB

 

I get asked about and see a lot of posts and comments on the internet about MongoDB not being as quick on query reads as people think it should be. These questions and/or comments are often followed by a panning of MongoDB itself. Often based on the user’s experience in this one situation. My first question in these situations typically is “What indexes are set up on your collection that relate to your queries?” More often than not I get a deer in headlights look back at me. After some stammering, the answer typically is “I don’t know.”, “Is indexing important?”, “Whatever is standard.”, or the most popular, “What’s an index?”.

Indexing Overview

In this blog post, I’d like to touch briefly on what indexes are in MongoDB and how they greatly impact performance. What is an index? If we start with the definition provided by MongoDB:

Indexes are special data structures that store a small portion of the collection’s data set in an easy to traverse form.

we get an idea from the “easy to traverse” statement that they make something that is complicated, easier. In this case, indexes make traversing a collection easier (faster).

Let’s consider a data set that includes all of the postal codes in the United States. (zips.json can be downloaded here). Without an appropriate index if our application wants to find, for example, the zip code for a particular city, let’s say Keizer, Oregon (97303), MongoDB would have to scan our entire collection for that city to return the appropriate zip code. In fact, based on our data set, it would have to look through all 29,467 records to be able to find and return the one record.

That’s a lot of unnecessary looking through the database to try to find the correct match of our search term. Imagine if our data set was much larger and included a million or more records. That would be a lot of overhead and searching. If we look at what is going on in a basic query for looking for our city of “KEIZER” buy having MongoDB explain the execution stats for our query, db.zips.find({"city": "KEIZER"}).explain("executionStats"), we can see a few things that are performance bottlenecks.

No Index Used
Full Collection Scan

First, we see that even in our relatively small database the query execution time was 34ms. Then, as expected, we looked at all 29,467 documents and that a collection scan was performed to do this query. Again imagine scanning a much larger data set and how that could be a slow process.

Now, what happens if we add an index? Since we are, in this case, searching by city name, it would make sense to create an index on that field. That can be accomplished in the Mongo Shell with the command:

db.zips.createIndex({"city": 1})

Which will create an ascending index on the city field in our collection. Now if we run the same query as before we should expect a couple of things. First, our query execution time should be significantly lower as well as the documents examined number.

Index Used
Find with a defined and used Index

Wow, with an index in place on the city field, doing a search on a city we get some amazing improvements. Our Actual Query Execution Time went from 34ms to zero, we are doing an index scan now (IXSCAN) instead of a collection scan (COLLSCAN) and the number of documents examined decreased to only having to examine a single document. That’s pretty powerful and highlights the need to have indexes on your collection.

After explaining this to MongoDB users I often get a “Why don’t I just index every field then?” response. Well, there’s no such thing as a free lunch, right? Indexes come with overhead. Some examples include memory usage and write performance of your data due to having to update indexes based on new data being stored.

We could also create indexes on multiple fields as well. We might, for example, not only be querying our database on a single city but on a city and state combination. In that case, we might want to look at generating a compound index that references multiple fields in the same index. In this example something like db.zips.createIndex({"city": 1, "state": 1}) might be useful.

Wrap Up

When deciding on an index to create there are a few common things to think about in general. First, create indexes which support your queries. If you are not going to query the zip code collection based on population (“pop”), there isn’t a need to generate an index for that field. Second, if your queries are doing sort operations, make sure that your indexes are supporting those in an efficient way. Third, make sure that your queries are allowing MongoDB to be selective in the results it provides from the query. This allows MongoDB to use the index for the majority of the work.

Indexes are an important part of proper application design with MongoDB. Having a properly designed index can have a large positive impact on the performance of your application. I would highly recommend reading more about them prior to your application deployment to ensure a great end user experience.

Facebooktwittergoogle_plusredditlinkedinmail

Python Bottle Framework Basics

Python offers several popular web frameworks such as Django, TurboGears, Flask, and Pyramid, to name a few. In this post I want to take a look at the Bottle framework. Actually, I’ll be spending a couple of posts using Bottle before jumping into Django.

Bottle

From the Bottle website, Bottle is described as:

Bottle is a fast, simple and lightweight WSGI micro web-framework for Python.

Fast… great. Simple… awesome. Lighweight… sounds good. WSGI??? What is that? WSGI is an acronym for Web Server Gateway Interface and comes from the PEP 333 standard. Which, in a nutshell, means that Bottle is going to be compliant with and capable of supporting all (or most) interactions between a Web server and a Web framework. Cool, that sounds reasonable.

Now what about that last part, the micro web-framework? Well, that typically means that the web-framework isn’t as full features as an enterprise level framework, such as Django, and by default doesn’t include some features expected in a full fledged web framework. While Bottle can indeed be deployed to a server and have full functioning applications, I find Bottle, and other micro web-frameworks, to be great for prototyping of ideas.

What I’d like to build out as a prototype in Bottle, then switch it over to Flask, is a very basic blood sugar tracking and logging application for those dealing with Type I Diabetes. We’ll get more into the data that we need to keep track of and the database schema in another post. In this post, we will concern ourselves simply with getting familiar with Bottle, routes, templates, and static files.

Routing

As stated in Bottle’s project description, it is fast and simple to get things running. We, of course, need to install it first using

pip install bottle

At the time of this writing, the current version is 0.12.13.

I’m going to assume for this discussion that HTTP methods such as GET and POST are something with which you are familiar. If not you can see an explanation of them here.  For starters we need to see how to establish a basic route in Bottle, so let’s do one for the every popular “Hello World!” example.

app.py

import bottle

# Site Index
@bottle.route('/')
def index():
    return "Hello World!"

if __name__ == '__main__':
    bottle.run('localhost', 8082, debug=True, reloader=True)

Now, if we run our application and go to http://localhost:8082 on our local machine we should see an HTML page with the text “Hello World!”. As promised, pretty quick and easy to get a page up and running with Bottle. It would be great if we could return some generated information as well. What about just adding a name to our route and have that name displayed on the screen? We’ll use the cgi module, or Common Gateway Interface to obtain that information from our URL.

app.py

import bottle
import cgi


# Site Index
@bottle.route('/')
def index():
    return "Hello World!"

# Index with name
@bottle.route('/<name>')
def get_name(name="Nobody"):
    name = cgi.html.escape(name)
    return "Hello {}".format(name)


if __name__ == '__main__':
    bottle.run('localhost', 8082, debug=True, reloader=True)

This is pretty cool and super simple to get pages out and displayed on a web page, but they don’t look great and just returning strings is still pretty limiting.

Template Engine

Fortunately Bottle comes with and utilizes a basic template engine called SimpleTemplate. This will allow us to write HTML files, use CSS, and include Python code inside our templates. Our template files will be of the file type .tpl. Let’s see about changing our name route to use a template and pass the template our URL information. Let’s call our template name.tpl.

app.py

import bottle
import cgi


# Site Index
@bottle.route('/')
def index():
    return "Hello World!"

# Index with name
@bottle.route('/<name>')
def get_name(name="Nobody"):
    name = cgi.html.escape(name)
    return bottle.template('name.tpl', name=name)


if __name__ == '__main__':
    bottle.run('localhost', 8082, debug=True, reloader=True)

name.tpl

<!DOCTYPE html>
<html>
    <head>
        <title>Name Page</title>
    </head>

    <body>
        Hello {{name}}!
    </body>
</html>

Great, we are passing our name variable information from our route in app.py to our template and, using the {{...}} syntax we are able to capture and display information. We can pass many different arguments into the return statement of our route such as data coming back from a database, data for regarding errors, user names, page titles, etc.

One thing that is always nice to be able to do in a web framework is to not have to repeat ourselves in every template file for things such as the HTML head information and page footer information. Once again, Bottle and SimpleTemplate makes this easy with the % include() feature. Let’s see this in action by creating a header.tpl file, include it inside our name.tpl file, and pass in title data from our route.

header.tpl

    
<!DOCTYPE html>
<html>
    <head>
        <title>{{title or 'No title'}}</title>
    </head>

    <body>   

name.tpl

% include('header.tpl', title=title)

        Hello {{name}}!
    </body>
</html>

And let’s not forget to update our route information as well…

app.py Snippet

# Index with name
@bottle.route('/<name>')
def get_name(name="Nobody"):
    name = cgi.html.escape(name)
    return bottle.template('name.tpl', name=name, title="Name Page")

Fantastic! Now we can see how we can utilize information from our Python code in app.py and display it in our templates.

Static Files

One other think I would like to mention in this post is how to setup Bottle for including static files like CSS. Again, Bottle makes this easy by simply having us define the path to our static files and by importing static_file from bottle and returning a static_file. This is the recommended way to serve static files and it provides some extra security protection to the files. We can define a new route for static CSS files like so:

from bottle import static_file

# Static CSS Files
@bottle.route('/static/css/<filename:re:.*\.css>')
def send_css(filename):
    return static_file(filename, root='static/css')

With the exception of the regular expression filter in the link name, that should look pretty familiar. The regular expression is simply telling Bottle that any file name with a .css extension should be looked for, in this case, in the static/css directory.

Now, inside our header.tpl file we can include a <link type="text/css" href="/static/css/styles.css" rel="stylesheet"> line and have our CSS files safely included and if they are missing Bottle will provide some convenient error messages.

Wrap Up

I think that is a pretty decent start for our knowledge of Bottle, how to create some routes in general, some basic templates, and serving some static files. In another post, I’ll show a basic site that will have the necessary routes and templates in place for our Blood Sugar Tracker application. From there I’ll show how we can utilize MongoDB as a data store for the application and include session and user signup information so we can provide some basic authentication with sessions and cookies, which Bottle provides some great and amazingly simple tools to enable that functionality.

See you next time and Happy Coding!


Follow me on Twitter @kenwalger to get the latest updates. If you enjoyed this article, or have questions, leave comments below.

Facebooktwittergoogle_plusredditlinkedinmail