New version… new features.

If you get involved in the software development industry, or heck, even just use a computer, you know that vendors pump out new versions of their software very frequently. It often seems like just when you get a handle on version 10, version 11 is introduced. Making the learning curve start all over. I’m so stubborn that I’m personally still on Microsoft Office 2003… I know, don’t laugh.

Sometimes new versions offer nothing but bug fixes and additional headaches. Other times new software versions come packed with a wide assortment of immediately useful new features. Such is the case of the latest release of MongoDB, 3.4, released in December of 2016.

I’d like to take a look at some of the included new features in this latest version for developers. There are several new features more suited for a discussion from an administrator standpoint, such as zones, elastic scalability enhancements, consistency control, and improvements to security for database access. For this post though, let’s stick to a discussion about some of the development-oriented features.

Version 3.4 provides some excellent improvements for data processing and aggregation, international language considerations, and third-party application connection tools.

$graphLookup

They have brought the ability to do graph processing directly into MongoDB with the new $graphLookup aggregation stage. This new stage allows MongoDB to recursively lookup documents based on a specific relationship to a starting document. There are, of course, many uses for this feature alone. Such as graphing social networks, business entities, genealogical family trees, etc. One can now utilize the power of a graph database natively within MongoDB. For those that have used a separate graph database, like Neo4j, they can now process their data using a single MongoDB datastore and get graphs along with other business critical data. With the rise of applications that are connected globally, being able to graph relationships is becoming a larger and larger need.

Additionally, it is becoming more and more critical to be able to offer localized language considerations to your application. My family just got done hosting an exchange student from Germany. She let us know that German phone books and German dictionaries sort words differently. Well, it would be great to be able to have the ability to account for these language rules and nuances. MongoDB 3.4 has support for over 100 different languages and locales. One can specify a collation based on collection, index, or view. Further, several operations support collation as well, such as find, aggregate, and update.

Big Data Tools

A couple of additional things which personally got me excited were some updates to the MongoDB Connector for BI and the MongoDB Connector for Apache Spark. As someone who enjoys being able to visualize data, the Connector for BI is very exciting. The performance improvements have been made by moving more query execution into MongoDB processes to reduce latency and bandwidth. Other improvements around the Connector for BI include a simplification of the installation, configuration process, authorization, and support for Windows.

The Spark Connector has received some nice attention as well with updates to support the latest Spark 2.0 release. For those familiar with Spark, the connector allows us access to Spark’s libraries for Scala, Java, Python, and R and brings MongoDB data in as DataFrames and Datasets. This allows us to utilize data already in MongoDB to be analyzed through Spark’s tools for machine learning, graph, streaming, and SQL APIs and provide shorter turnaround times for data scientists and engineers.

Wrap Up

There are several other features new in 3.4 as well, such as faceted navigation, enhancements to the aggregation pipeline, and a support for the decimal data type. In previous posts, I have mentioned some of the advancements to MongoDB Compass and their DBaaS offering Atlas. There are a lot of exciting things in all of these new and/or enhanced features that make version 3.4 worth examining further.

I am excited to continue to examine the possibilities especially around $graphLookup, multi-language collations, and the various connectors. I know that not every organization is in a place and able to upgrade to 3.4 today, but it is definitely worth a look. You can download it in their download center here.

Follow me on Twitter @kenwalger to get the latest updates on my postings.

Facebooktwitterredditlinkedinmail

Modeling your data with Documents

I’m writing this post today while waiting for my oldest daughter to start her high school track and field event. I’m watching these talented athletes push their bodies well beyond anything I can personally relate to and in between rain showers and wind gusts there are meet records and hearts, broken. Since these are High School student athletes, my mind wanders to what their future holds. Will their athletic abilities allow them to compete at a collegiate level, or will high school be the glory days of their running, jumping, and throwing experience? Is a college athletic scholarship in their future? Hmmm…. a recruiting website for my daughter’s Javelin skills would be interesting.

Data Modeling Options

While building a site just for her would likely include static performance data, the data geek side of me wonders how one would model track and field participation data inside a database. As is often the case for those that have been around data for a while, we start to think of the tables we would need. Clearly, for our sample here we would need an athlete table, probably an event table with a list of possible events, another one for the meet name, just to name a few. We haven’t even gotten into data normalization yet, right? We should likely have a high_school table to maintain that information. A weather table to keep track of the rain and wind values, and much more. Here’s a quick visualization of what our database might look like in a relational database management system (RDBMS) world.

Relational Model

We would need all of these tables to be able to pull up data in our application to see which event Jane Doe competed in at a specific meet. Then JOIN it with all the other tables to be able to generate some useful information for a college athletic recruiter to see. That seems like a lot of joins (computational time). All to grab a relatively small percentage of the data from each row in our tables. For example, in the high_school table, we likely don’t need to display the school’s address and phone number on a student’s site itself. However, it would be nice to be able to provide a link to that information should a coach want to contact the school. We still, however, need to do the JOINs to get the information. Granted for those of us with some experience working with RDBMS technology, this example isn’t overly complicated and the SQL necessary to come up with the data and the specific JOIN statements aren’t horrendous, but we are still asking for some computational power to be expended to do the joins and retrieve the data.

Another Way to Model Data

What if there was another way to model our data? What if we could model our data in a way that was more application specific and suited our needs for showing off an athlete’s skills to potential recruiters? Further, what if we could do that and get data back from our database? And do so without JOIN operations and get all the data we need for our application at once? Sounds pretty amazing, yes? Well, that is precisely where we can use NoSQL and a Document Model for our data. Let’s have a look at how we might model our athlete’s data in MongoDB using their data model.

Document Model

To start with, in the document model, we can think about the information our application needs. Things like athlete name, event, time/distance accomplished for that event, when and where the event took place, what high school the athlete is from, etc. Instead of making unnecessary trips to the database, we can design our data for our application.

Our document, therefore, could be designed to hold all of the data our application would need. We could have our document look like:

Looks a lot like JSON

Something might seem familiar about this format too. It is modeled in JSON which, at least for me, is a much friendly format that table upon table. It also allows us to develop quickly. What happens when our student decides to do another sport in the fall and winter? In our RDBMS model that will involve more tables and joins. In our document model we can simply add another sport name, event, and statistic we want to track.

Another great feature of our document model is that we don’t need to be concerned with NULL values. If an athlete doesn’t participate in the long jump, there isn’t a reason to maintain a value for that. If they suddenly do participate in the long jump at a given meet, we can record that data as well. Similarly, if Jane and Kendra are involved in different sports, it is absolutely okay for their schemas to look different. They can even have different information. This is the concept of flexible schema and can be very powerful.

Document Model Key Features

There are several reasons why using a document model is becoming more and more popular. Some of them are:

  • The large amounts of data many applications generate combined with changing data types.
  • Rapid development times and agile development practices often require quick iterations of a product. A flexible schema in a document data model easily allows for this.
  • Gone are the days in which accessing data is from a single device to a single audience. Data needs to be always on and globally accessible in today’s world. This requires data stores to be able to scale accordingly and provide application specific data.

Conclusion

For all these reasons the document data model shouldn’t be discounted for your next project. In fact, I may just use such a model for my daughter’s recruiting site. After a long day at the track meet she threw her second best throw of the javelin ever at 92′ 6″. Perhaps not NCAA Division I bound. However, maybe with a web application backed by a document model a college somewhere will notice her.


Follow me on Twitter @kenwalger to get the latest updates on my postings.

There are a few MongoDB specific terms in this post. I created a MongoDB Dictionary skill for the Amazon Echo line of products. Check it out and you can say “Alexa, ask MongoDB for the definition of a document?” and get a helpful response.

 

Facebooktwitterredditlinkedinmail