Big Data and NoSQL: A Great Coupling

Big data has been a buzzword and topic of conversation for many years. Similarly, the idea of NoSQL as a data store versus “traditional” relational databases is a frequently debated topic. With all of the conversation about these topics, it is important to ask a few questions. What is this thing called big data? Are NoSQL databases like OrientDB a good fit for big data? Let’s explore these questions and learn why NoSQL can indeed be a great fit for big data.

Big Data Defined

A popular definition of big data states:

Any quantity of data which is too large or complex that traditional computing power is inadequate to deal with.

While this makes sense, it is important to note that the concept of big data is somewhat relative. It likely has a much different meaning and concept for Farmer Bob, the local apple farmer than it does for Tree Top juice company. In the technology realm, we often think of companies like Facebook or Netflix in relationship to big data processing. But every industry is capturing more and more data.

Think about our apple farmers. Some of the data they might want to be collecting would be soil moisture levels, air & soil temperatures, rainfall, the type of apple being grown, the geo-location of the trees, soil pH, fertilizer applications, pollination events, just to name a few.

In the data industry, there are four common dimensions of big data. The so-called 4 V’s, volume, variety, velocity, and veracity.

4 Dimensions of Big Data
  • Volume – the size and scale of the data, a terabyte or more of data is often a breakpoint. But even that is relatively small. A Boeing 737 generates approximately 240 terabytes of data flying across the United States. How ’bout them apples?
  • Variety – the diversity of data in different forms. In our farming example, there could be different sensors around the orchard sending a variety of data into the database.
  • Velocity – the speed at which data is collected and needs to be processed. Perhaps not entirely applicable for Farmer Bob, but for many big data applications velocity is very important. Stock trading algorithms change very quickly and must be timely, for example.
  • Veracity – the quality of data itself, is it accurate? Is it stored accurately and can it be retrieved accurately?

All of these pieces together help to influence the decision as to if you are dealing with big data.

Assuming, then, that our apple orchard is large enough to qualify for big data membership, why would we choose a NoSQL database? How does a NoSQL database, like OrientDB, help to address the concerns of the 4 V’s?

Stop the Presses: NoSQL for the Win

Okay, a bit of a pun there with presses while talking about apple orchards, please forgive me. But seriously, NoSQL does offer some great wins for storing big data. In fact, in many of the areas of the 4 V’s NoSQL is a better choice than its relational, SQL, counterpart.

The variety of data becomes problematic in a relational data model as it requires multiple tables and many Joins to process the data. A multi-model database like OrientDB is capable of storing records in a schemaless fashion that is easy to generate programmatically. This allows for data with a variety of “looks” to be stored quickly into a single record. Being a multi-model database, a variety of types of records can be stored in the same database.

Further, without having to do Joins, traversing the data for read operations is very fast. The ability to read and write records rapidly is vital for keeping up with the velocity of data. Performance of the database is a huge portion of being able to reliably process your data and get results.

One of the challenges that SQL databases face is that of scalability. They are mostly designed around a master-slave architecture which often becomes a bottleneck. The solution to scale is over to upgrade hardware to bigger and faster. In the NoSQL world, we can scale with a multi-master and sharded architecture. This allows for horizontal scaling and adds to veracity as well as the new servers can be spread out across geographic locations.

Making sure the data you think you are storing is actually being stored is probably the most important of all. Having the ability and confidence of knowing that your database transactions are secure and accurate is huge. OrientDB transactions are ACID compliant providing an extra level of guarantee that your data is your data. This also helps to make sure that the technical department doesn’t upset the apple cart with corporate executives.

Wrap Up

We have taken a look at what big data is and how to determine if your data is big data. We then examined why NoSQL is a great fit for storing our data. OrientDB is definitely worth a look at storing your NoSQL data for your big data needs.

Follow me on Twitter @kenwalger to get the latest updates on my postings or read my other posts on my blog.


MongoDB.local San Francisco Recap

On 12 October 2017 MongoDB kicked off a new series of events. I was fortunate to attend the MongoDB.local San Francisco conference along with approximately 600 other MongoDB enthusiasts. If you missed this event, fear not, they will be coming to cities around the world over the next twelve months. MongoDB has a yearly event too, MongoDB World, but having these local, smaller, conferences provides them with more opportunities to reach their customer base, provide training, and announce new product features.

MongoDB.local San Francisco Venue

The MongoDB.local San Francisco was held in the fashionable Westfield Centre in the Union Square district. It was hosted in the Bespoke coworking and event space and was very impressively done. It was comfortable and spacious. As is required for any tech-related conference, they provide free Wi-Fi access. It was capable of keeping up with all of the devices throughout the conference. Even during sessions which required attendees to utilize their laptops.

There were three different rooms being utilized for various sessions and was able to handle the crowds fairly well. Some of the sessions were pretty crowded but that didn’t impact the capability to impart a variety of topics in a professional fashion.

Sessions, Staff, & Sponsors


There were a wide range of topics covered and skill levels of the talks. MongoDB also brought in experts from a wide range of their company departments to deliver talks and meet customers. CTO Eliot Horowitz was there to give the morning keynote address and covered many of the features upcoming in the soon to be released 3.6 version of the product. Some of these new features include charts, updates to their BI Connector, enhancements to the $lookup operator, change streams, retryable writes, security improvements, Atlas updates, just to name a few. There was also a great demo of their Backend-as-a-Service offering, Stitch.

Jumpstart Sessions

Prior to the keynote, there were a couple of jumpstart sessions to get provide attendees new to MongoDB the opportunity to level set and get on the same page in terms of vocabulary and concepts. Jay Gordon of the Developer Advocate team gave a great Your Introduction to MongoDB talk to introduce folks to the MongoDB world. Sig Narvaez provided an Introduction to Schema Design as a jumpstart session to, yes, talk about some schema design basics.

Breakout Sessions

After the keynote address, multiple break-out sessions took place. The slides should be available online soon, so I won’t go into great detail. However, the talks by Steffan Mejia on security, ETL techniques by André Spiegel, Spark and Machine Learning from Bryan Reinero, and the last session of the day on the Aggregation Framework from Asya Kamsky were all exceptional talks.

Rubbin’ Elbows

I was also fortunate to be able to meet in person some other great people from MongoDB. The Vice President of Engineering, Education at MongoDB, Shannon Bradshaw, was at the MongoDB University booth answering questions about their content and certification program. It was great hearing about some of their planned upcoming course offerings. Chief Marketing Officer (CMO) Meagen Eisenberg was fun to share stories about being owners of new puppies. It made me appreciate only having one to her two.

Marshall the Siberian Huskey
Marshall the Siberian Huskey

Finally, I got to meet Francesca Krihely of the Developer Advocacy team. Ms. Krihely has been a huge help to me personally over the years with MongoDB User Group establishment. She has provided lots of resources for speaking engagements, support, and is a wealth of MongoDB knowledge. Definitely a great contact to have in the MongoDB world.


In addition to Bespoke, several companies attended the event as sponsors. I can’t give them all enough thanks for coming out and helping to support the MongoDB community. The least I can do is to name them and provide their web links.

MongoDB.local San Francisco Swag

Similar to PyCon 2017, MongoDB didn’t skimp on their offerings of swag. T-shirts, stickers, “Rubik’s Cubes” were all part of the “official” offerings. If you were willing to answer a few questions from a survey you could get a logoed Moscow Mule Mug.

MongoDB.local San Francisco Swag T-Shirts
MongoDB.local San Francisco Swag T-Shirts
MongoDB Rubik Cube
MongoDB Rubik CubeRec
MongoDB Moscow Mule Mug
MongoDB Moscow Mule Mug

MongoDB University was offering MongoDB Certified Developers and DBAs their choice of a portable battery bank or foldable headphones.

"Born to Build with MongoDB" battery pack
“Born to Build with MongoDB” battery pack
MongoDB University Headphones from MongoDB.local San Francisco
MongoDB University Headphones

Wrap Up

To wrap it up, it was a great day of MongoDB knowledge and learning. Aside from some wildfire smoke issues at San Francisco International Airport causing flight delays, it was a day well spent. I’m looking forward to attending another MongoDB sponsored event soon.

Follow me on Twitter @kenwalger to get the latest updates on my postings.

There are a few MongoDB specific terms in this post. I created a MongoDB Dictionary skill for the Amazon Echo line of products. Check it out and you can say “Alexa, ask MongoDB for the definition of a document?” and get a helpful response.