schema design Archives | Blog of Ken W. Alger

As we wrap up the Building with Patterns series, it’s a good opportunity to recap the problems the patterns that have been covered solve and highlight some of the benefits and trade-offs each pattern has. The most frequent question that is asked about schema design patterns, is “I’m designing an application to do X, how do I model the data?” As we hope you have discovered over the course of this blog series, there are a lot of things to take into consideration to answer that. However, we’ve included a Sample Use Case chart that we’ve found helpful to at least provide some initial guidance on data modeling patterns for generic use cases.

Sample Use Cases

The chart below is a guideline for what we’ve found after years of experience working with our customers of what schema design patterns are used in a variety of applications. This is not a “set in stone” set of rules about which design pattern can be used for a particular type of application. Ensure you look at the ones that are frequently used in your use case. However, don’t discard the other ones, they may still apply. How you design your application’s data schema is very dependent on your data access patterns.

Design Pattern Summaries

Approximation

The Approximation Pattern is useful when expensive calculations are frequently done and when the precision of those calculations is not the highest priority.

Pros

Fewer writes to the database.
Maintain statistically valid numbers.

Cons

Exact numbers aren’t being represented.
Implementation must be done in the application.

Attribute

The Attribute Pattern is useful for problems that are based around having big documents with many similar fields but there is a subset of fields that share common characteristics and we want to sort or query on that subset of fields. When the fields we need to sort on are only found in a small subset of documents. Or when both of those conditions are met within the documents.

Pros

Fewer indexes are needed.
Queries become simpler to write and are generally faster.

Bucket

The Bucket Pattern is a great solution for when needing to manage streaming data, such as time-series, real-time analytics, or Internet of Things (IoT) applications.

Pros

Reduces the overall number of documents in a collection.
Improves index performance.
Can simplify data access by leveraging pre-aggregation.

Computed

When there are very read intensive data access patterns and that data needs to be repeatedly computed by the application, the Computed Pattern is a great option to explore.

Pros

Reduction in CPU workload for frequent computations.
Queries become simpler to write and are generally faster.

Cons

It may be difficult to identify the need for this pattern.
Applying or overusing the pattern should be avoided unless needed.

Document Versioning

When you are faced with the need to maintain previous versions of documents in MongoDB, the Document Versioning pattern is a possible solution.

Pros

Easy to implement, even on existing systems.
No performance impact on queries on the latest revision.

Cons

Doubles the number of writes.
Queries need to target the correct collection.

Extended Reference

You will find the Extended Reference pattern most useful when your application is experiencing lots of JOIN operations to bring together frequently accessed data.

Pros

Improves performance when there are a lot of JOIN operations.
Faster reads and a reduction in the overall number of JOINs.

Cons

Data duplication.

Outlier

Do you find that there are a few queries or documents that don’t fit into the rest of your typical data patterns? Are these exceptions driving your application solution? If so, the Outlier Pattern is a wonderful solution to this situation.

Pros

Prevents a few documents or queries from determining an application’s solution.
Queries are tailored for “typical” use cases, but outliers are still addressed.

Cons

Often tailored for specific queries, therefore ad hoc queries may not perform well.
Much of this pattern is done with application code.

Pre-allocation

When you know your document structure and your application simply needs to fill it with data, the Pre-Allocation Pattern is the right choice.

Pros

Design simplification when the document structure is known in advance.

Cons

Simplicity versus performance.

Polymorphic

The Polymorphic Pattern is the solution when there are a variety of documents that have more similarities than differences and the documents need to be kept in a single collection.

Pros

Easy to implement.
Queries can run across a single collection.

Schema Versioning

Just about every application can benefit from the Schema Versioning Pattern as changes to the data schema frequently occur in an application’s lifetime. This pattern allows for previous and current versions of documents to exist side by side in a collection.

Pros

No downtime needed.
Control of schema migration.
Reduced future technical debt.

Cons

Might need two indexes for the same field during migration.

Subset

The Subset Pattern solves the problem of having the working set exceed the capacity of RAM due to large documents that have much of the data in the document not being used by the application.

Pros

Reduction in the overall size of the working set.
Shorter disk access time for the most frequently used data.

Cons

We must manage the subset.
Pulling in additional data requires additional trips to the database.

Tree

When data is of a hierarchical structure and is frequently queried, the Tree Pattern is the design pattern to implement.

Pros

Increased performance by avoiding multiple JOIN operations.

Cons

Updates to the graph need to be managed in the application.

Conclusion

As we hope you have seen in this series, the MongoDB document model provides a lot of flexibility in how you model data. That flexibility is incredibly powerful but that power needs to be harnessed in terms of your application’s data access patterns. Remember that schema design in MongoDB has a tremendous impact on the performance of your application. We’ve found that performance issues can frequently be traced to poor schema design.

Keep in mind that to further enhance the power of the document model, these schema design patterns can be used together, when and if it makes sense. For example, Schema Versioning can be used in conjunction with any of the other patterns as your application evolves. With the twelve schema design patterns that have been covered, you have the tools and knowledge needed to harness the power of the document model’s flexibility.

It has been said that the only thing constant in life is change. This holds true to database schemas as well. Information we once thought wouldn’t be needed, we now want to capture. Or new services become available and need to be included in a database record. Regardless of the reason behind the change, after a while, we inevitably need to make changes to the underlying schema design in our application. While this often poses challenges, and perhaps at least a few headaches in a legacy tabular database system, in MongoDB we can use the Schema Versioning pattern to make the changes easier.

Updating a database schema in a tabular database can as mentioned, be challenging. Typically the application needs to be stopped, the database migrated to support the new schema and then restarted. This downtime can lead to poor customer experience. Additionally, what happens if the migration wasn’t a complete success? Reverting back to the prior state is often an even larger challenge.

The Schema Versioning pattern takes advantage of MongoDB’s support for differently shaped documents to exist in the same database collection. This polymorphic aspect of MongoDB is very powerful. It allows documents that have different fields or even different field types for the same field, to peaceful exist side by side.

The Schema Versioning Pattern

The implementation of this pattern is relatively easy. Our applications start with an original schema which eventually needs to be altered. When that occurs we can create and save the new schema to the database with a schema_version field. This field will allow our application to know how to handle this particular document. Alternatively, we can have our application deduce the version based on the presence or absence of some given fields, but the former method is preferred. We can assume that documents that don’t have this field, are version 1. Each new schema version would then increment the schema_version field value and could be handled accordingly in the application.

As new information is saved, we use the most current schema version. We could make a determination, depending on the application and use case, as to the need of updating all documents to the new design, updating when a record is accessed, or not at all. Inside the application, we would create handling functions for each schema version.

Sample Use Case

As stated, just about every database needs to be changed at some point during its lifecycle, so this pattern is useful in many situations. Let’s take a look at a customer profile use case. We start keeping customer information before there is a wide range of contact methods. They can only be reached at home or at work:

{
    "_id": "<ObjectId>",
    "name": "Anakin Skywalker",
    "home": "503-555-0000",
    "work": "503-555-0010"
}

As the years go by and more and more customer records are being saved, we notice that mobile numbers are needing to be saved as well. Adding that field in is straight forward.

{
    "_id": "<ObjectId>",
    "name": "Darth Vader",
    "home": "503-555-0100",
    "work": "503-555-0110",
    "mobile": "503-555-0120"
}

More time goes by and now we’re discovering that fewer and fewer people have a home phone, and other contact methods are becoming more important to record. Items like Twitter, Skype, and Google Hangouts are becoming more popular and maybe weren’t even available when we first started keeping contact information. We also want to attempt to future proof our application as much as possible and after reading the Building with Patterns series we know about the Attribute Pattern and implement that into a contact_method array of values. In doing so, we create a new schema version.

{
    "_id": "<ObjectId>",
    "schema_version": "2",
    "name": "Anakin Skywalker (Retired)",
    "contact_method": [
        { "work": "503-555-0210" },
        { "mobile": "503-555-0220" },
        { "twitter": "@anakinskywalker" },
        { "skype": "AlwaysWithYou" }
    ]
}

The flexibility of the MongoDB document model allows for all of this to occur without downtime of the database. From an application standpoint, it can be designed to read both versions of the schema. This application change in how to handle the schema difference shouldn’t require downtime either, assuming there is more than a single app server involved.

Conclusion

The Schema Versioning pattern is great for when application downtime isn’t an option, updating the documents may take hours, days, or weeks of time to complete, updating the documents to the new version isn’t a requirement, or a combination of any of these. It allows for a new schema_version field to easily be added and for the application to adjust to these changes. Additionally, it provides us as developers the opportunity to better decide when and how data migrations will take place. All of these things result in less future technical debt, another big advantage for this pattern.

As with the other patterns mentioned in this series, there are some things to consider with the Schema Versioning pattern too. If you have an index on a field that is not located at the same level in the document, you may need 2 indexes while you are migrating the documents.

One of the main benefits of this pattern is the simplicity involved in the data model itself. All that is required is to add the schema_version field. Then allow the application to handle and process the different document versions.

Additionally, as was seen in the use case example, we are able to combine schema design patterns together for extra performance. In this case, using the Schema Versioning and Attribute patterns together. Allowing to make schema upgrades without downtime makes the Schema Versioning pattern particularly powerful in MongoDB and could very well be enough of a reason to use MongoDB’s document model versus a legacy tabular database for your next application.

The next post in this series will be a wrap up of all of the patterns we’ve looked at thus far and provide some additional information about use cases we’ve found to be particularly well suited for each pattern.

If you have questions, please leave comments below.

Previous Parts of Building with Patterns:

The Polymorphic pattern
The Attribute pattern
The Bucket pattern
The Outlier pattern
The Computed pattern
The Subset pattern
The Extended Reference pattern
The Approximation pattern
The Tree pattern
The Pre-allocation pattern
The Document Versioning pattern

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31