New R Driver Option in MongoDB 3.6

In a previous post, I discussed some options for using MongoDB with the R Language. While the information in that post is still accurate, MongoDB 3.6 introduces a new R driver option. Unlike the drivers previously discussed, this new R driver is under active development.

New R Driver

The new R driver, mongolite, can be found on GitHub and is easily installed from the Comprehensive R Archive Network (CRAN) using the install.packages("mongolite") command on Windows or OS-X. The driver, or client, is authored by Jeroen Ooms who also has provided some nice documentation in PDF format. Let’s head into RStudio and take a look at using the new R driver.

Accessing MongoDB

Let’s use the same dataset as in the previous post and, with mongoimport, get the data into a MongoDB collection. The database again will be called kenblog and the collection is scores. Here again is a sample document in the collection:

{
   "_id" : ObjectId("5627207b33ff2cf40effc25e"),
   "student" : 2,
   "type" : "quiz",
   "score" : 74
}

After using the install.packages("mongolite") command, we can put the power of R to work. We establish a connection to our database:

> require("mongolite")
Loading required package: mongolite
> connection <- mongo(collection = "scores", db = "kenblog", url = "mongodb://localhost")

The mongo connection method accepts the following arguments:

  • collection
  • db
  • url
  • verbose
  • options

The collection and db arguments allow for the specification of the names of the respective database information. url is the mongo connection string in URI format. If you need additional output from the connection one can set the verbose boolean value to TRUE. Additional connection options, such as SSL information, can also be passed in.

With a connection established, let’s query our scores collection for exam data.

examQuery <- connection$find('{"type": "exam"}')

This brings in our 585 exam documents. We can then create a vector of the exam scores and have a look at their summary.

> exam_scores <- examQuery[c('score')] 
> summary(exam_scores)
     score       
 Min.   : 60.00  
 1st Qu.: 72.00  
 Median : 79.00  
 Mean   : 79.45  
 3rd Qu.: 86.00  
 Max.   :100.00 

Personally, I’m already liking the syntax of this new R driver for doing queries and working with MongoDB in R.

Other methods that will be familiar to MongoDB users are drop() to drop a collection, aggregate for aggregation pipeline operations, and insert for creating information in the database. There are many additional methods that can be used that allow for map-reduce operations and importing or exporting JSON or BSON data, and many more.

Further, with support for features such as indexing, encryption, and authentication, this new R driver is much more robust than previous options.

Wrap Up

Connecting to MongoDB from R is pretty straightforward and simple using the new R driver. It is the new “official” and supported method to leverage the power of R with the flexibility and power of MongoDB.

R is a great statistical language and can definitely be used to query and analyze MongoDB collections. If you are using R in your work today, this new way of connecting with MongoDB is definitely worth a look.


Follow me on Twitter @kenwalger to get the latest updates on my postings.

There are a few MongoDB specific terms in this post. I created a MongoDB Dictionary skill for the Amazon Echo line of products. Check it out and you can say “Alexa, ask MongoDB for the definition of a document?” and get a helpful response.

Facebooktwittergoogle_plusredditlinkedinmail

Book Review: The Little Mongo DB Schema Design Book

I mentioned in a previous post on Schema Design I mentioned a book on the subject and that I hadn’t, at the time, read it. After hearing The Little Mongo DB Schema Design Book by Christian Kvalheim mentioned elsewhere I thought I would see what it was all about. The book was published in May of 2015. Even though it is a bit old, the coverage of schema design is still relevant.

The Little Mongo DB Schema Design Book

MongoDB Overview

Kvalheim starts off the book with a quick introduction to MongoDB and some basic principles of schema design before moving into some examples of data modeling patterns. I thought his discussion of One-To-One, One-To-Many, and Many-To-Many data models was well done. He used some good examples of blogs and users to explain the concepts in an easy to follow fashion.

Kvalheim moves on from there to cover an overview of storage engines available in MongoDB. Specifically the MMAP and WiredTiger storage engines. This provides nice coverage for those using older, pre version 3.2 instances of MongoDB, as well as those who have opted to upgrade to more recent versions. At the time of this writing, version 3.6 is the most current.

After the discussion on storage engines, we are provided with information indexes and sharding concepts before diving into specifics about schema design itself.

Schema Design Patterns

Once we move into the design pattern section of the book, Kvalheim does a nice job of breaking each design option down. He follows a consistent format for each pattern discussing the unique aspects of typical data modeling patterns. He showcases their operations and provides recommendations for indexing, scaling, and performance implications.

The examples are done very well and provide some great coverage of a wide variety of use cases for data storage. Some example schema designs covered are:

  • Time Series
  • Account Transactions
  • Internationalization
  • Shopping Carts
  • Reservations

In total eleven distinct design concepts are explored.

Improvements

One of the downsides to print books about technology topics is the speed in which the information changes. There are indeed a lot of installations of MongoDB using versions before version 3.2. Being a user of MongoDB after 3.2 I found the discussions of the MMAP storage engine to be less relevant than they were in 2015.

There were a few type-setting issues in this schema design book but I didn’t find those to be too troubling. They generally just required rereading the sentence a time or two to grasp the meaning of the sentence.

Wrap Up on the Schema Design Book

Overall, I found this book to be a great resource for schema design. Definitely an excellent addition to one’s library for application development when using MongoDB as a database. There are some features in post-2015 releases of MongoDB that assist developers and database administrators in schema management as well. Document validation was introduced in version 3.2. Version 3.6 extended the validation process with schema validation.


Follow me on Twitter @kenwalger to get the latest updates on my postings. Or better yet, sign up for the email list to get updates in your mailbox!

There are a few MongoDB specific terms in this post. I created a MongoDB Dictionary skill for the Amazon Echo line of products. Check it out and you can say “Alexa, ask MongoDB for the definition of a document?” and get a helpful response.

Facebooktwittergoogle_plusredditlinkedinmail