Book Review: The Little Mongo DB Schema Design Book

I mentioned in a previous post on Schema Design I mentioned a book on the subject and that I hadn’t, at the time, read it. After hearing The Little Mongo DB Schema Design Book by Christian Kvalheim mentioned elsewhere I thought I would see what it was all about. The book was published in May of 2015. Even though it is a bit old, the coverage of schema design is still relevant.

The Little Mongo DB Schema Design Book

MongoDB Overview

Kvalheim starts off the book with a quick introduction to MongoDB and some basic principles of schema design before moving into some examples of data modeling patterns. I thought his discussion of One-To-One, One-To-Many, and Many-To-Many data models was well done. He used some good examples of blogs and users to explain the concepts in an easy to follow fashion.

Kvalheim moves on from there to cover an overview of storage engines available in MongoDB. Specifically the MMAP and WiredTiger storage engines. This provides nice coverage for those using older, pre version 3.2 instances of MongoDB, as well as those who have opted to upgrade to more recent versions. At the time of this writing, version 3.6 is the most current.

After the discussion on storage engines, we are provided with information indexes and sharding concepts before diving into specifics about schema design itself.

Schema Design Patterns

Once we move into the design pattern section of the book, Kvalheim does a nice job of breaking each design option down. He follows a consistent format for each pattern discussing the unique aspects of typical data modeling patterns. He showcases their operations and provides recommendations for indexing, scaling, and performance implications.

The examples are done very well and provide some great coverage of a wide variety of use cases for data storage. Some example schema designs covered are:

  • Time Series
  • Account Transactions
  • Internationalization
  • Shopping Carts
  • Reservations

In total eleven distinct design concepts are explored.

Improvements

One of the downsides to print books about technology topics is the speed in which the information changes. There are indeed a lot of installations of MongoDB using versions before version 3.2. Being a user of MongoDB after 3.2 I found the discussions of the MMAP storage engine to be less relevant than they were in 2015.

There were a few type-setting issues in this schema design book but I didn’t find those to be too troubling. They generally just required rereading the sentence a time or two to grasp the meaning of the sentence.

Wrap Up on the Schema Design Book

Overall, I found this book to be a great resource for schema design. Definitely an excellent addition to one’s library for application development when using MongoDB as a database. There are some features in post-2015 releases of MongoDB that assist developers and database administrators in schema management as well. Document validation was introduced in version 3.2. Version 3.6 extended the validation process with schema validation.


Follow me on Twitter @kenwalger to get the latest updates on my postings. Or better yet, sign up for the email list to get updates in your mailbox!

There are a few MongoDB specific terms in this post. I created a MongoDB Dictionary skill for the Amazon Echo line of products. Check it out and you can say “Alexa, ask MongoDB for the definition of a document?” and get a helpful response.

Facebooktwittergoogle_plusredditlinkedinmail

Schema Design Considerations in MongoDB

I’ve previously touched on some of the benefits and a few examples of how to do schema design in MongoDB. One often raised question when it comes to modeling data in MongoDB is how best to handle data schema in a non-relational database. I’d like to explore in more depth some of the considerations required for effective schema design for MongoDB implementations.

One of the key things to remember when modeling your data in MongoDB is how the application is going to use the data. Your data access patterns should be of foremost thought when designing your data model. Unlike data normalization concerns in relational databases, embedding data in a document often provides better performance.

When, however, does one decide to embed documents inside another document? What are some of the considerations for doing so when thinking about schema design?

Types of Relationships

In the relational database world modeling different relationships comes down to examining how to model “One-to-N” relationships and the normalization of data. In MongoDB, there are a variety of ways to model these relationships. When doing schema design in MongoDB there is more to consider than a blanket model for a “One-to-N” relationship model.

We need to consider the size of “N” for our modeling because in this instance, size matters. One-to-one relationships can easily be handled with embedding a document inside another document. But what happens if “N” grows? Let’s have a look at the following cases, “One-to-few”, “One-to-Many”, and “One-to-Tons”.

One-to-Few

This is a pretty common occurrence, even in the relational database world. A single record that needs to be associated with a relatively small number of other data points. Something like keeping customer information and their associated phone numbers or addresses. We can embed an array of information inside the document for the customer.

{ 
  "_id" : ObjectId("56cb1cfb72d245023179fda4"),
  "name" :  "Harvey Waldrip",
  "phone" : [
     { "type" : "mobile", "number" : "503-555-5555" }, 
     { "type" : "home", "number" : "503-555-1111"}
  ]
}

This showcases the benefits, and drawbacks, of embedding. We can easily get the embedded information with a single query. The downside, however, is that the embedded can’t be accessed as autonomous data.

One-to-Many

“Many” here covers up to a few thousand or so in number. Say that we are modeling a product made up of smaller parts. For example, if we had an electronic parts kit each part in the kit could be referenced as a separate part.

{ 
  "_id" : ObjectId("AAAA"),
  "part_no" : "150ohm-0.5W"
  "name" : "150ohm 1/2 Watt Resistor"
  "qty" : 1
  "cost" : { NumberDecimal("0.13"), currency: "USD" }
}

Each piece in the kit would have its own document. Notice the format of the “cost” value, I discussed that in a post on Modeling Monetary Data in MongoDB. Each final product, or kit in our example, will contain an array of references to the necessary parts.

{
  "_id" : ObjectId("57d7a121fa937f710a7d486e"),
  "manufacturer" : "Elegoo",
  "catalog_number" : 123789,
  "parts" : [
     ObjectID("AAAA"),
     ObjectID("AAAB"),
     ObjectID("G9D6"),
     ...
  ]
}

Now we can utilize an application level join or depending on the use case the $lookup aggregation pipeline operator to get information about specific parts in a kit. For best performance, we need to make sure we have proper indexes in place on our collections as well.

This style of reference allows for quick and easy search and updating of the parts in the kit. It has basically become an “N-to-N” schema design without needing a separate join table. Pretty slick.

One-to-Tons

As I mentioned, “One-to-Many” is great for up to several thousand references. But what about cases when that isn’t enough?  Further, what if the referencing poses schema design concerns around the document limitation of 16MB? This is where parent referencing becomes very useful.

Let’s imagine an event log situation. We would have a document for the host machine and store that host machine in the log message documents.

Host

{ "_id" : "Bunyan", 
  "name" : "logger.lumberjack.com", 
  "ip_address" : "127.55.55.55"
}

Message

{ "_id" : "MongoDB", 
  "time" : ISODate("2017-08-29T17:25:00.000Z"),
  "message" : "Timber!!!", 
  "host" : ObjectId("Bunyan")
}

Again, for optimum searching, we would want to make sure indexes are properly in place.

Schema Design – Key Considerations

Now that we have seen some of the schema design options, how do we determine which is the best one to utilize? There are a few things to think about before choosing and have somewhat become the standard questions to ask when doing schema design in MongoDB.

Golden Rules for MongoDB Schema Design
  1. Unless there is a compelling reason to do so, favor embedding.
  2. Needing to access an object on its own is a compelling reason to not embed the object in another document.
  3. Unbounded array growth is a bad design.
  4. Don’t be afraid of joins on the application side. A properly indexed collection and query can have highly performant results.
  5. When denormalizing your data, consider the read to write ratio of the application.
  6. Finally, how you model your data depends on your application’s data access patterns. Match your schema design to how your application reads and writes the data.

Wrap Up

There are some great references available for designing your schemas in MongoDB. Some of my favorites are MongoDB Applied Design Patterns and MongoDB in Action. While I have not seen or read it, The Little Mongo DB Schema Design Book looks like a promising resource as well.

Juan Roy has a nice slide deck available on this topic as well. Definitely worth having a look.

There are several MongoDB specific terms in this post. I created a MongoDB Dictionary skill for the Amazon Echo line of products. Check it out and you can say “Alexa, ask MongoDB what is a document?” and get a helpful response.


Follow me on Twitter @kenwalger to get the latest updates on my postings.

Facebooktwittergoogle_plusredditlinkedinmail