Modeling your data with Documents

I’m writing this post today while waiting for my oldest daughter to start her high school track and field event. I’m watching these talented athletes push their bodies well beyond anything I can personally relate to and in between rain showers and wind gusts there are meet records and hearts, broken. Since these are High School student athletes, my mind wanders to what their future holds. Will their athletic abilities allow them to compete at a collegiate level, or will high school be the glory days of their running, jumping, and throwing experience? Is a college athletic scholarship in their future? Hmmm…. a recruiting website for my daughter’s Javelin skills would be interesting.

Data Modeling Options

While building a site just for her would likely include static performance data, the data geek side of me wonders how one would model track and field participation data inside a database. As is often the case for those that have been around data for a while, we start to think of the tables we would need. Clearly, for our sample here we would need an athlete table, probably an event table with a list of possible events, another one for the meet name, just to name a few. We haven’t even gotten into data normalization yet, right? We should likely have a high_school table to maintain that information. A weather table to keep track of the rain and wind values, and much more. Here’s a quick visualization of what our database might look like in a relational database management system (RDBMS) world.

Relational Model

We would need all of these tables to be able to pull up data in our application to see which event Jane Doe competed in at a specific meet. Then JOIN it with all the other tables to be able to generate some useful information for a college athletic recruiter to see. That seems like a lot of joins (computational time). All to grab a relatively small percentage of the data from each row in our tables. For example, in the high_school table, we likely don’t need to display the school’s address and phone number on a student’s site itself. However, it would be nice to be able to provide a link to that information should a coach want to contact the school. We still, however, need to do the JOINs to get the information. Granted for those of us with some experience working with RDBMS technology, this example isn’t overly complicated and the SQL necessary to come up with the data and the specific JOIN statements aren’t horrendous, but we are still asking for some computational power to be expended to do the joins and retrieve the data.

Another Way to Model Data

What if there was another way to model our data? What if we could model our data in a way that was more application specific and suited our needs for showing off an athlete’s skills to potential recruiters? Further, what if we could do that and get data back from our database? And do so without JOIN operations and get all the data we need for our application at once? Sounds pretty amazing, yes? Well, that is precisely where we can use NoSQL and a Document Model for our data. Let’s have a look at how we might model our athlete’s data in MongoDB using their data model.

Document Model

To start with, in the document model, we can think about the information our application needs. Things like athlete name, event, time/distance accomplished for that event, when and where the event took place, what high school the athlete is from, etc. Instead of making unnecessary trips to the database, we can design our data for our application.

Our document, therefore, could be designed to hold all of the data our application would need. We could have our document look like:

Looks a lot like JSON

Something might seem familiar about this format too. It is modeled in JSON which, at least for me, is a much friendly format that table upon table. It also allows us to develop quickly. What happens when our student decides to do another sport in the fall and winter? In our RDBMS model that will involve more tables and joins. In our document model we can simply add another sport name, event, and statistic we want to track.

Another great feature of our document model is that we don’t need to be concerned with NULL values. If an athlete doesn’t participate in the long jump, there isn’t a reason to maintain a value for that. If they suddenly do participate in the long jump at a given meet, we can record that data as well. Similarly, if Jane and Kendra are involved in different sports, it is absolutely okay for their schemas to look different. They can even have different information. This is the concept of flexible schema and can be very powerful.

Document Model Key Features

There are several reasons why using a document model is becoming more and more popular. Some of them are:

  • The large amounts of data many applications generate combined with changing data types.
  • Rapid development times and agile development practices often require quick iterations of a product. A flexible schema in a document data model easily allows for this.
  • Gone are the days in which accessing data is from a single device to a single audience. Data needs to be always on and globally accessible in today’s world. This requires data stores to be able to scale accordingly and provide application specific data.

Conclusion

For all these reasons the document data model shouldn’t be discounted for your next project. In fact, I may just use such a model for my daughter’s recruiting site. After a long day at the track meet she threw her second best throw of the javelin ever at 92′ 6″. Perhaps not NCAA Division I bound. However, maybe with a web application backed by a document model a college somewhere will notice her.


Follow me on Twitter @kenwalger to get the latest updates on my postings.

There are a few MongoDB specific terms in this post. I created a MongoDB Dictionary skill for the Amazon Echo line of products. Check it out and you can say “Alexa, ask MongoDB for the definition of a document?” and get a helpful response.

 

Facebooktwitterredditlinkedinmail

MongoDB Compass: An Overview

When I first started using computers, everything was done through the command line. Everything. Apple had yet to come out with the Macintosh, Windows was not even a glimmer in the eye of Microsoft. For that matter, Microsoft wasn’t a glimmer yet. Everyone got used to living in the shell and knowing a multitude of commands, arguments, and syntax to achieve even the smallest of tasks. Along came the graphical user interface (GUI) both in terms of operating systems and a variety of tools. Many of these tools provided much of the functionality that could be achieved in the command line, certainly, the majority of the most common tasks were included in an easier to use and visualize format.

This is the case for MongoDB‘s Compass product. Since it’s original release in December of 2015 several versions and upgrades have been added. As of this writing, we are now on version 1.6. Like many GUI tools, it aims to offer an easier interface for accomplishing many tasks one might otherwise do in the mongo shell. In this post, I’d like to take a look at some of the more useful features that Compass provides for managing your MongoDB databases and collections.

MongoDB Compass Features

One of the great features of MongoDB is that it has a flexible data model. This is a powerful feature for schema design. However, from the shell it can pose limitations on knowing the exact makeup of a collection’s schema. Compass provides a visual interface which samples a subset of the data in a collection and analyzes the documents from within this subset.

Data Vizualization

This provides an easy to see snapshot of how the collection is designed in an easy to use GUI fashion. One gets immediate information on types, ranges, and frequency of fields in a collection. This allows for the ability to easily choose, sort, and query the collection. All based on the data presented in the schema visualization. The same results can certainly be obtained with shell commands. However, the time and effort involved to get there is much greater.

Database & Collection Interaction

This feature alone in Compass makes it worth the download, but there is much more! Creating and dropping databases and collections is possible in Compass and, as I pointed out in a previous post on indexing MongoDB collections, indexes can have a huge impact on the performance of queries. Compass provides an intuitive interface for index creation. Once created it provides detailed feedback about the performance of a query. And what and how many steps and time MongoDB had to go through to return the documents.

Index Creation
Create Index with Compass
Index Used
Find with a defined and used Index

Once again, all of this can be done in the shell right? We can create an index in the shell with db.collection.createIndex() and we can get explain results in the shell as well. I might argue though that the creation of indexes in Compass is easier with many of the various index creation nuances during the creation step. Further, being able to see the results of the explain plan visually is of benefit, of course, you can also see the raw JSON as well for a more in-depth look at the results.

Dev-Ops features

The last of the features that I want to discuss in this post is the ability to get insights into some of the operational aspects of the database itself. Compass provides a GUI interface for server performance for the metrics of Operations, Read & Write, Network, and Memory usage. It also provides information on the Hottest Collections. Meaning the collections with the most activities as well as information on slow running queries (potential candidates for indexes here). All the information can be obtained from the command line using various tools that come with MongoDB, such as mongostat and mongotop, and the profiler for slow queries. Once again, the GUI interface provided by Compass provides, in my opinion, an easier way to grasp what is going on with the MongoDB Server.

Wrap Up

I realize that Compass is not the end-all answer to MongoDB management. Not all functionality of managing and maintaining a MongoDB database server and associated collections is possible. There are still tasks that currently still require using the command line. Heck, I still wind up using the command line to perform tasks in Windows and OS X. However, for a product that has only been out for less than a year and a half, this is a great start.

Compass has a lot more features that I didn’t cover here, I would encourage you to download it for your Windows, OS X, or Linux system and discover a new way to visualize your MongoDB data and graphically manage your system.

There are a lot of MongoDB specific terms in this post. I created a MongoDB Dictionary skill for the Amazon Echo line of products. Check it out and you can say “Alexa, ask MongoDB what is the definition of a collection?” and get a helpful response.


Follow me on Twitter @kenwalger to get the latest updates on my postings.

Facebooktwitterredditlinkedinmail