Building with Patterns: The Document Versioning Pattern

Databases, such as MongoDB, are very good at querying lots of data and updating that data frequently. In most cases, however, we are only performing queries on the latest state of the data. What about situations in which we need to query previous states of the data? What if we need to have some functionality of version control of our documents? This is where we can use the Document Versioning Pattern.

This pattern is all about keeping the version history of documents available and usable. We could construct a system that uses a dedicated version control system in conjunction with MongoDB. One system for the few documents that change and MongoDB for the others. This would be potentially cumbersome. By using the Document Versioning Pattern, however, we are able to avoid having to use multiple systems for managing the current documents and their history by keeping them in one database.

The Document Versioning Pattern

This pattern addresses the problem of wanting to keep around older revisions of some documents in MongoDB instead of bringing in a second management system. To accomplish this, we add a field to each document allowing us to keep track of the document version. The database will then have two collections: one that has the latest (and most queried data) and another that has all of the revisions of the data.

The Document Versioning Pattern makes a few assumptions about the data in the database and the data access patterns that the application makes.

  1. Each document doesn’t have too many revisions.
  2. There aren’t too many documents to version.
  3. Most of the queries performed are done on the most current version of the document.

If you find that these assumptions don’t fit your use case, this pattern may not be a great fit. You may have to alter how you implement your version of the Document Versioning Pattern or your use case may simply require a different solution.

Sample Use Case

The Document Versioning Pattern is very useful in highly regulated industries that require a specific point in time version of a set of data. Financial and healthcare industries are good examples. Insurance and legal industries are some others. There are many use cases that track histories of some portion of the data.

Think of how an insurance company might make use of this pattern. Each customer has a “standard” policy and a second portion that is specific to that customer, a policy rider if you will. This second portion would contain a list of policy add-ons and a list of specific items that are being insured. As the customer changes what specific items are insured, this information needs to be updated while the historical information needs to be available as well. This is fairly common in homeowner or renters insurance policies. For example, if someone has specific items they want to be insured beyond the typical coverage provided, they are listed separately, as a rider. Another use case for the insurance company may be to keep all the versions of the “standard policy” they have mailed to their customers over time.

If we take a look at the requirements for the Document Versioning Pattern, this seems like a great use case. The insurance company likely has a few million customers, the revisions to the “add-on” list likely don’t occur too frequently, and the majority of searches on a policy will be about the most current version.

Inside our database, each customer might have a current_policy document — containing customer specific information — in a current_policies collection and policy_revision documents in a policy_revisions collection. Additionally, there would be a standard_policy collection that would be the same for most customers. When a customer purchases a new item and wants it added to their policy, a new policy_revision document is created using the current_policy document. A version field in the document is then incremented to identify it as the latest revision and the customer’s changes added.

Policy Revisions and Current Policy

The newest revision will be stored in the current_policies collection and the old version will be written to the policy_revisions collection. By keeping the latest versions in the current_policy collection queries can remain simple. The policy_revisions collection might only keep a few versions back as well, depending on data needs and requirements.

Policy Revisions and Current Policy Collections

In this example then, Middle-earth Insurance would have a standard_policy for its customers. All residents of The Shire would share this particular policy document. Bilbo has specific things he wants insuring on top of his normal coverage. His Elven Sword and, eventually, the One Ring are added to his policy. These would reside in the current_policies collection and as changes are made the policy_revisions collection would keep a historical record of changes.

The Document Versioning Pattern is relatively easy to accomplish. It can be implemented on existing systems without too many changes to the application or to existing documents. Further, queries accessing the latest version of the document remain performant.

One drawback to this pattern is the need to access a different collection for historical information. Another is the fact that writes will be higher overall to the database. This is why one of the requirements to use this pattern is that it occurs on data that isn’t changed too frequently.

Conclusion

When you need to keep track of changes to documents, the Document Versioning Pattern is a great option. It is relatively easy to implement and can be applied to an existing set of documents. Another benefit is that queries to the latest version of data still perform well. It does not, however, replace a dedicated version control system.

The next post in this series will look at the final design pattern, Schema Versioning.

If you have questions, please leave comments below.

Previous Parts of Building with Patterns:

Facebooktwittergoogle_plusredditlinkedinmail

Building with Patterns: The Preallocation Pattern

One of the great things about MongoDB is the document data model. It provides for a lot of flexibility not only in schema design but in the development cycle as well. Not knowing what fields will be required down the road is easily handled with MongoDB documents. However, there are times when the structure is known and being able to fill or grow the structure makes the design much simpler. This is where we can use the Preallocation Pattern.

Memory allocation is often done in blocks to avoid performance issues. In the earlier days of MongoDB (prior to MongoDB version 3.2), when it used the MMAPv1 storage engine, a common optimization was to allocate in advance the memory needed for the future size of a constantly growing document. Growing documents in MMAPv1 needed to be relocated at a fairly expensive cost by the server. With its lock-free and rewrite on update algorithms, WiredTiger does not require this same treatment.

With the deprecation of MMAPv1 in MongoDB 4.0, the Preallocation Pattern appeared to lose some of its luster and necessity. However, there are still use cases for the Preallocation Pattern with WiredTiger. As with the other patterns we’ve discussed in the Building with Patterns series, there are a few application considerations to think about.

The Preallocation Pattern

This pattern simply dictates to create an initial empty structure to be filled later. It may sound trivial, however, you will need to balance the desired outcome in simplification versus the additional resources that the solution may consume. Bigger documents will make for a larger working set resulting in more RAM to contain this working set.

If the code of the application is much easier to write and maintain if it uses a structure that is not completed filled, it may easily outweigh the cost of the RAM. Let’s say there is a need to represent a theater room as a 2-dimensional array where each seat has a “row” and “number”, for example, the seat “C7”. Some rows may have fewer seats, however finding the seat “B3” is faster and cleaner in a 2-dimensional array, than having a complicated formula to find a seat in a one-dimensional array that has only cells for the existing seats. Being able to identify accessible seating is also easier as a separate array can be created for those seats.

Seating Map

Two dimensional representation of venue, valid seats available in green. Accessible seating notated with a blue outline.

One-dimensional seating map

One dimensional representation of venue, accessible seats shown in blue.

Sample Use Case

As seen earlier, representing a 2 dimension structure, like a venue, is a good use case. Another example could be a reservation system where a resource is blocked or reserved, on a per day basis. Using one cell per available day would likely make computations and checking faster than keeping a list of ranges.

Image of the month of April 2019 with an array

The month of April 2019 with an array of U.S. work days.

Image of the month of April 2019 with a list of ranges

The month of April 2019 with an array of U.S. work days as a list of ranges.

Conclusion

This pattern may be one of the most used when using the MMAPv1 storage engine with MongoDB. However due to the depreciation of this storage engine, it has lost its generic use case, but it is still useful in some situations. And like other patterns, you have a trade-off between “simplicity” and “performance”.

The next post in this series will look at the Document Versioning Pattern.

If you have questions, please leave comments below.

Previous Parts of Building with Patterns:

Facebooktwittergoogle_plusredditlinkedinmail