New R Driver Option in MongoDB 3.6

In a previous post, I discussed some options for using MongoDB with the R Language. While the information in that post is still accurate, MongoDB 3.6 introduces a new R driver option. Unlike the drivers previously discussed, this new R driver is under active development.

New R Driver

The new R driver, mongolite, can be found on GitHub and is easily installed from the Comprehensive R Archive Network (CRAN) using the install.packages("mongolite") command on Windows or OS-X. The driver, or client, is authored by Jeroen Ooms who also has provided some nice documentation in PDF format. Let’s head into RStudio and take a look at using the new R driver.

Accessing MongoDB

Let’s use the same dataset as in the previous post and, with mongoimport, get the data into a MongoDB collection. The database again will be called kenblog and the collection is scores. Here again is a sample document in the collection:

{
   "_id" : ObjectId("5627207b33ff2cf40effc25e"),
   "student" : 2,
   "type" : "quiz",
   "score" : 74
}

After using the install.packages("mongolite") command, we can put the power of R to work. We establish a connection to our database:

> require("mongolite")
Loading required package: mongolite
> connection <- mongo(collection = "scores", db = "kenblog", url = "mongodb://localhost")

The mongo connection method accepts the following arguments:

  • collection
  • db
  • url
  • verbose
  • options

The collection and db arguments allow for the specification of the names of the respective database information. url is the mongo connection string in URI format. If you need additional output from the connection one can set the verbose boolean value to TRUE. Additional connection options, such as SSL information, can also be passed in.

With a connection established, let’s query our scores collection for exam data.

examQuery <- connection$find('{"type": "exam"}')

This brings in our 585 exam documents. We can then create a vector of the exam scores and have a look at their summary.

> exam_scores <- examQuery[c('score')] 
> summary(exam_scores)
     score       
 Min.   : 60.00  
 1st Qu.: 72.00  
 Median : 79.00  
 Mean   : 79.45  
 3rd Qu.: 86.00  
 Max.   :100.00 

Personally, I’m already liking the syntax of this new R driver for doing queries and working with MongoDB in R.

Other methods that will be familiar to MongoDB users are drop() to drop a collection, aggregate for aggregation pipeline operations, and insert for creating information in the database. There are many additional methods that can be used that allow for map-reduce operations and importing or exporting JSON or BSON data, and many more.

Further, with support for features such as indexing, encryption, and authentication, this new R driver is much more robust than previous options.

Wrap Up

Connecting to MongoDB from R is pretty straightforward and simple using the new R driver. It is the new “official” and supported method to leverage the power of R with the flexibility and power of MongoDB.

R is a great statistical language and can definitely be used to query and analyze MongoDB collections. If you are using R in your work today, this new way of connecting with MongoDB is definitely worth a look. If R is still new to you, Learning R: A Step-by-Step Function Guide to Data Analysis is a great way to get started. Or the R For Dummies book may be of interest as well.

This post was updated on 5 Jan 2020.


Follow me on Twitter @kenwalger to get the latest updates on my postings.

There are a few MongoDB specific terms in this post. I created a MongoDB Dictionary skill for the Amazon Echo line of products. Check it out and you can say “Alexa, ask MongoDB for the definition of a document?” and get a helpful response.

Facebooktwitterredditlinkedinmail

Schema Validation in MongoDB 3.6

MongoDB 3.6 brings lots of great new features with the new release. I’ve already covered Change Streams and Retryable Writes in previous posts. This post will cover a feature which expands upon the document validation feature from MongoDB 3.2, schema validation. Schema validation allows for teams to define a prescribed document structure for each collection.

There are times when enforcing strict data structures and content are required, even with the flexible schema and document data model that MongoDB provides. Schema validation allows for the ability to define a prescribed document structure for each collection. If one tries to insert or update a document which does not conform to the applied structure the operation can be rejected. The rules for document structure are based on the JSON schema draft specification.

Schema Validation

Let’s take a quick look at how this works in action before diving into a discussion of feature benefits.

Imagine a collection of food recipes. Each recipe will have a recipe name, number of servings, cooking method, ingredients, and list of instructions. For the ingredients, we want to enforce that there is a numerical quantity, a measure of quantity, the ingredient name, and an optional value for any prep work on the ingredient, such as “peeled” or “brunoise“.  For our example here my options are not all-inclusive for options but simply serve as examples. I hope my former culinary colleagues forgive me.

We’ll begin with creating a recipes collection and assigning the schema validation rules with the validator option and the new $jsonSchema operator.

db.createCollection( "recipes",
{
  validator: 
    {
      $jsonSchema:
        {
          bsonType: "object",
          required: ["name", "servings", "ingredients"],
          additionalProperties: false,
          properties:
            {
              _id: {},
              name: {
                bsonType: "string",
                description: "'name' is required and is a string"
                    },
              servings: {
                bsonType: ["int", "double"],
                minimum: 0,
                description: "'servings' is required and must be an integer greater than zero."
                    },
              cooking_method: {
                 enum: ["broil", "grill", "roast", "bake", "saute", "pan-fry", "deep-fry", "poach", "simmer", "boil", "steam", "braise", "stew"],
              description: "'cooking_method' is optional but, if used, must be one of the listed options."
                    },
              ingredients: 
              {
                bsonType: ["array"],
                minItems: 1,
                maxItems: 50,
                items: {
                  bsonType: ["object"],     
                  required: ["quantity", "measure", "ingredient"],
                  additionalProperties: false,
                  description: "'ingredients' must contain the stated fields.",
                  properties: 
                  {
                    quantity: {
                      bsonType: ["double", "decimal"],
                      description: "'quantity' is required and is of double or decimal type"
                            },
                    measure: {
                      enum: ["tsp", "Tbsp", "cup", "ounce", "pound",  "each"],
                      description: "'measure' is required and can only be one of the given enum values"
                            },
                    ingredient: {
                      bsonType: "string",
                      description: "'ingredient' is required and is a string"
                            },
                    format: {
                      bsonType: "string",
                      description: "'format' is an optional field of type string"
                            }
                  }
              }
           }
        }
     }
   }
})

Our validator can include many more rules that are beyond the scope of this introduction to schema validation. However, we can see how powerful this feature is in this small example. Let’s look at some example documents to insert as well to see what would happen if we try to insert a document into the collection.

Sample Inserts
db.recipes.insertOne({name: "Chocolate Sponge Cake Filling", 
servings: 4, 
ingredients: [{quantity: 7, measure: "ounce", ingredient: "bittersweet chocolate", format: "chopped"}, 
{quantity: 2, measure: "cup", ingredient: "heavy cream"}
]})

This insert works since it covers all of the required fields in their proper format. It also doesn’t include any invalid extra fields which would be prohibited since we have additionalProperties: false set. If we were to try to insert the following document, however, we would get an error.

db.recipes.insertOne({name: "Chocolate Sponge Cake Filling", 
servings: 4, 
ingredients: [{quantity: 7, measure: "ounce", ingredient: "bittersweet chocolate", format: "chopped"}, 
{quantity: 2, measure: "cup", ingredient: "heavy cream"}],
directions: "Boil cream and pour over chocolate. Stir until chocolate is melted."
})

Since we added in a directions field into our recipe document, the insert will fail with additionalProperties: false set. I should add in an optional directions field into the schema validation to allow for that as directions are indeed important.

Schema Validation Benefits

Even with these basic examples, I think it is clear that schema validation is a great new enhancement. The flexibility of a dynamic schema in MongoDB can now easily be paired with data governance controls over an entire collection. While there are lots of practical benefits from this, here are some specific benefits.

  1. Application logic simplification. With a strictly defined definition of what a collection looks like, there isn’t a need for the application to handle the guarantees and associated errors.
  2. Control of data. Client applications must adhere to the rules set forth by the collection. No need to worry about data being updated or inserted that has incorrect field names or data types.
  3. Governmental compliance. There are applications and data models in a variety of industries and locales that require data to be stored in specific formats. For example, the EU General Data Protection Regulation in the European Union.

Wrap Up

With MongoDB 3.6 schema validation, administrators have tunable controls over the database. Documents that don’t conform to a prescribed set of conditions can be rejected or still written and a message logged about the action. It also brings with it the ability to query based on the schema definition to allow, for example, a search on all documents that don’t conform to the schema.

Schema validation is another exciting feature in the 3.6 release of MongoDB. I would urge you to download it and try it out. I’d bet you’ll be as excited about it as I am.


Follow me on Twitter @kenwalger to get the latest updates on my postings. Or better yet, sign up for the email list to get updates in your mailbox!

There are a few MongoDB specific terms in this post. I created a MongoDB Dictionary skill for the Amazon Echo line of products. Check it out and you can say “Alexa, ask MongoDB for the definition of a document?” and get a helpful response. I also created a culinary skill for the Echo if you’d like to update your culinary knowledge as well.

Facebooktwitterredditlinkedinmail