Using R with MongoDB

NOTE: MongoDB 3.6 has a new R Language support. See my other blog post for the latest information.

The R programming language is a powerful language used for statistical computing. When working with statistical computing it is frequently the case that the data being explored will come from a database. Some of the powers that R excels at are working with data in tables and matrices and joining columns and rows together. This seems like a great fit for SQL databases, but what about a NoSQL database like MongoDB? Can R analyze a MongoDB document as easily as a SQL table?

Well, this post would be pretty short if the answer was “No”, right? So let take a look and how to pull data into R from a MongoDB collection. Then we’ll take a brief look at examining our data.

Setting Up

While there are plugins available for a variety of IDEs, such as those by JetBrains, it is pretty common to use RStudio when working with R. Somewhere along the line, I picked up a “scores” database in MongoDB that we’ll use as our sample data. I’ve posted it here for download.

We can easily import the data into our MongoDB database using mongoimport. In my case, I put it into a database called kenblog and a collection called scores. Pretty creative, eh? Here’s what a sample document in the collection looks like:

{
   "_id" : ObjectId("5627207b33ff2cf40effc25e"),
   "student" : 2,
   "type" : "quiz",
   "score" : 74
}

There are 1,787 records in our collection with the type of assignment being either quiz, essay, or exam. Let’s see how we can access our data with R.

First, we need to get and load our package for interfacing MongoDB with R. For this example I’ll be using RMongo, but there is another package available, rmongodb. Sadly it doesn’t look like there has been much in the way of current activity with either package’s GitHub repositories. Aside from that we can still connect and do some queries.

Connecting R to MongoDB

We need to bring in our package and establish our connection:

require(RMongo)

mongo <- mongoDbConnect('kenblog', 'localhost', 27017)

In the mongoDbConnect method, we have options for the name of the database, server name, and port number to which we want to connect.

Next, we will want to send a query. For this example, let’s get only the exam data from our scores collection. We can use the dbGetQuery method for this which takes a connection object, the collection name, and the query.

examQuery <- dbGetQuery(mongo, 'scores', "{'type': 'exam'}")

This loads in all of the records from our scores collection of type exam. Let’s take the values of our exam scores and create a vector from them.

exam_scores <- examQuery[c('score')]

Nice! Now we can utilize some of the power of R to do some data analysis. Let’s get a simple summary of our data with summary(exam_scores):

     score       
 Min.   : 60.00  
 1st Qu.: 72.00  
 Median : 79.00  
 Mean   : 79.45  
 3rd Qu.: 86.00  
 Max.   :100.00 

Neat. I realize that this particular example could be computed using MongoDB’s powerful aggregation framework. However, there are times when using outside resources and languages, like R, for processing is called for.

Wrap Up

Connecting to MongoDB from R is pretty straightforward and simple using the RMongo package. However, many of the new features that MongoDB has implemented in the last few years have not been included in the community R drivers. Further, as of this post, there isn’t an “official” R driver supported by MongoDB.

R is a great statistical language and can definitely be used to query and analyze MongoDB collections. If you are using R in your work today, MongoDB is a definite option for storing your data to be analyzed.


Follow me on Twitter @kenwalger to get the latest updates on my postings.

There are a few MongoDB specific terms in this post. I created a MongoDB Dictionary skill for the Amazon Echo line of products. Check it out and you can say “Alexa, ask MongoDB for the definition of a document?” and get a helpful response.

Facebooktwitterredditlinkedinmail

MongoDB.local San Francisco Recap

On 12 October 2017 MongoDB kicked off a new series of events. I was fortunate to attend the MongoDB.local San Francisco conference along with approximately 600 other MongoDB enthusiasts. If you missed this event, fear not, they will be coming to cities around the world over the next twelve months. MongoDB has a yearly event too, MongoDB World, but having these local, smaller, conferences provides them with more opportunities to reach their customer base, provide training, and announce new product features.

MongoDB.local San Francisco Venue

The MongoDB.local San Francisco was held in the fashionable Westfield Centre in the Union Square district. It was hosted in the Bespoke coworking and event space and was very impressively done. It was comfortable and spacious. As is required for any tech-related conference, they provide free Wi-Fi access. It was capable of keeping up with all of the devices throughout the conference. Even during sessions which required attendees to utilize their laptops.

There were three different rooms being utilized for various sessions and was able to handle the crowds fairly well. Some of the sessions were pretty crowded but that didn’t impact the capability to impart a variety of topics in a professional fashion.

Sessions, Staff, & Sponsors

Keynote

There were a wide range of topics covered and skill levels of the talks. MongoDB also brought in experts from a wide range of their company departments to deliver talks and meet customers. CTO Eliot Horowitz was there to give the morning keynote address and covered many of the features upcoming in the soon to be released 3.6 version of the product. Some of these new features include charts, updates to their BI Connector, enhancements to the $lookup operator, change streams, retryable writes, security improvements, Atlas updates, just to name a few. There was also a great demo of their Backend-as-a-Service offering, Stitch.

Jumpstart Sessions

Prior to the keynote, there were a couple of jumpstart sessions to get provide attendees new to MongoDB the opportunity to level set and get on the same page in terms of vocabulary and concepts. Jay Gordon of the Developer Advocate team gave a great Your Introduction to MongoDB talk to introduce folks to the MongoDB world. Sig Narvaez provided an Introduction to Schema Design as a jumpstart session to, yes, talk about some schema design basics.

Breakout Sessions

After the keynote address, multiple break-out sessions took place. The slides should be available online soon, so I won’t go into great detail. However, the talks by Steffan Mejia on security, ETL techniques by André Spiegel, Spark and Machine Learning from Bryan Reinero, and the last session of the day on the Aggregation Framework from Asya Kamsky were all exceptional talks.

Rubbin’ Elbows

I was also fortunate to be able to meet in person some other great people from MongoDB. The Vice President of Engineering, Education at MongoDB, Shannon Bradshaw, was at the MongoDB University booth answering questions about their content and certification program. It was great hearing about some of their planned upcoming course offerings. Chief Marketing Officer (CMO) Meagen Eisenberg was fun to share stories about being owners of new puppies. It made me appreciate only having one to her two.

Marshall the Siberian Huskey
Marshall the Siberian Huskey

Finally, I got to meet Francesca Krihely of the Developer Advocacy team. Ms. Krihely has been a huge help to me personally over the years with MongoDB User Group establishment. She has provided lots of resources for speaking engagements, support, and is a wealth of MongoDB knowledge. Definitely a great contact to have in the MongoDB world.

Sponsors

In addition to Bespoke, several companies attended the event as sponsors. I can’t give them all enough thanks for coming out and helping to support the MongoDB community. The least I can do is to name them and provide their web links.

MongoDB.local San Francisco Swag

Similar to PyCon 2017, MongoDB didn’t skimp on their offerings of swag. T-shirts, stickers, “Rubik’s Cubes” were all part of the “official” offerings. If you were willing to answer a few questions from a survey you could get a logoed Moscow Mule Mug.

MongoDB.local San Francisco Swag T-Shirts
MongoDB.local San Francisco Swag T-Shirts
MongoDB Rubik Cube
MongoDB Rubik CubeRec
MongoDB Moscow Mule Mug
MongoDB Moscow Mule Mug

MongoDB University was offering MongoDB Certified Developers and DBAs their choice of a portable battery bank or foldable headphones.

"Born to Build with MongoDB" battery pack
“Born to Build with MongoDB” battery pack
MongoDB University Headphones from MongoDB.local San Francisco
MongoDB University Headphones

Wrap Up

To wrap it up, it was a great day of MongoDB knowledge and learning. Aside from some wildfire smoke issues at San Francisco International Airport causing flight delays, it was a day well spent. I’m looking forward to attending another MongoDB sponsored event soon.


Follow me on Twitter @kenwalger to get the latest updates on my postings.

There are a few MongoDB specific terms in this post. I created a MongoDB Dictionary skill for the Amazon Echo line of products. Check it out and you can say “Alexa, ask MongoDB for the definition of a document?” and get a helpful response.

Facebooktwitterredditlinkedinmail