Storing Documents in a MongoDB Database?

MongoDB is considered a document database. As happens from time to time in the world of technology, the same word is used for different concepts. In this case, the word “document” can sometimes cause confusion. Is storing documents the job of the database?

For many of us, when we hear the word “document” we think of a Microsoft Word document. Some may think of Adobe Acrobat‘s Portable Document Format files. Or any other common “document” concepts. In MongoDB and, in fact, many other database models the term document has a different meaning. I wrote a post about the document model as it pertains specifically to MongoDB and would recommend having a look for further information. In a nutshell, however, in MongoDB, a document is an individual record of information.

What if we actually want to store “documents” in a database? Is there a best practice for storing documents with MongoDB? What are some options to consider?

MongoDB GridFS

GridFS allows for the storage of more data in a single MongoDB document than is allowed by the BSON specification of 16MB. This allows for the database record to be split into separate files on disk. This can allow for some pretty specific benefits in addition to those laid out in the GridFS documentation. For example, with replication, one could keep files deployed across multiple facilities and utilize MongoDB’s synchronization features. However, many file systems account for this as well on their own.

Storing documents with GridFS is a topic in an of itself and beyond to scope of this blog post. It’s definitely on the list of things to cover, though.

Another Option for Storing Documents

Let me start here by saying that I am a bit biased in this regard. I have never been a huge fan of storing “documents”, images, videos, etc. inside a database. It has always been my preference to store files on a file server, network attached storage system, or in the cloud somewhere like Amazon S3.

How then does one combine these two concepts? Store the files on disk outside the database, but utilize the power of the database as well? Well, you use the MongoDB document model and flexible schema to handle the metadata of the file, along with a link to the location of the file. Something along the lines of:

{
    _id : ,
    file_title : "<title>",
    file_URL : "<S3_URL>",
    uploaded_by : "<user_name>",
    created_at : <Timestamp>,
    updated_at : <Timestamp>,
    updated_by : "<user>",
    department : "<department_name>",
    tags : [
        { name : "<tag1>" },
        { name : "<tag2>" }
    ],
    version : <Timestamp>
}

This sort of design allows for some nice performance and query possibilities. If, for example, we want to search for “all documents tagged with the Python topic uploaded by Ken between 01 Jan 2016 and 01 Oct 2017” we can query for that. We also get to utilize the file system itself which is often better optimized for file storage reads and writes.

Wrap Up

In this short post, I hope to have clarified the concept of a document that is stored in a MongoDB database, like a Word or PDF file. Versus the concept of a document that is a MongoDB database record. While MongoDB can be utilized for file storage, in my opinion, file storage should be left to, well, file storage systems.

Design your database schema to maintain the metadata and links for your documents and take advantage of MongoDB’s rich query language for finding your information.


Follow me on Twitter @kenwalger to get the latest updates on my postings.

There are a few MongoDB specific terms in this post. I created a MongoDB Dictionary skill for the Amazon Echo line of products. Check it out and you can say “Alexa, ask MongoDB for the definition of a document?” and get a helpful response.

Facebooktwitterredditlinkedinmail

Performing joins in MongoDB with $lookup

Eventually, it seems, looking up data in multiple MongoDB collections at the same time becomes necessary. Okay, necessary, might be strongly phrased. “Very helpful” may be better. Until version 3.2 of MongoDB doing joins wasn’t possible, but with that version came the $lookup aggregation stage operator. The introduction of $lookup allows for left outer joins to be performed on collections in the same database which are not sharded.

For our data for this post let’s utilize the concept of recipes. I’ll lay out some basic traditional SQL database tables and walk through the SQL syntax for a join.

Recipe table layout

SQL Syntax

Let’s have a quick look at what a left outer join looks like in SQL with a couple of different syntaxes in SQL.

Standard SQL
SELECT recipe_type.decription, recipes.title
FROM recipe_type
LEFT OUTER JOIN recipes
ON recipe_type.id = recipes.recipe_type_id
Oracle
SELECT recipe_type.description, recipes.title
FROM recipe_type, recipes
WHERE recipe_type.id = recipes.recipe_type_id(+)

The results of a left outer join for these two tables will contain all of the rows from our “left” table (recipe_type).

$lookup for Joins

Concept of a Left Outer Join
Concept of a Left Outer Join

Our join from $lookup then should be the same from a conceptual standpoint as our SQL cousin. Assuming we have a collection of recipe types and one of recipes our $lookup stage would look like the following:

{
  $lookup:
    {
      from: "recipes",
      localField: "id",
      foreignField: "recipe_type_id",
      as: "recipe_categories"
    }
}

Our $lookup stage then is doing an equality match between the two documents based on the id and recipe_type_id fields. It will add a new array field to the document in our aggregation pipeline document and passes it to the next stage of the pipeline.

Now, given our knowledge of schema design and document models in MongoDB we may not have a need for this exact join as these two collections of data might be embedded in one or the other collection. However, depending on your data access patterns and application needs, the collections may be utilizing references.

Wrap Up

I hope this provided a useful, and quick, introduction to the $lookup aggregation stage and how it is possible to do joins in MongoDB. While you may not find yourself using it all of the time, it is a great tool to have at the ready when it is needed.

There are several MongoDB specific terms in this post. I created a MongoDB Dictionary skill for the Amazon Echo line of products. Check it out and you can say “Alexa, ask MongoDB what is a document?” and get a helpful response.


Follow me on Twitter @kenwalger to get the latest updates on my postings.

Facebooktwitterredditlinkedinmail