MongoDB Plugin for PyCharm

There are many different options available when developing to look at and examine your MongoDB collections. MongoDB’s Compass is a great example of a tool that allows for the viewing and interaction with a database, collection, or document. However, when developing it is often useful to have the ability to see your data inside your development environment. Let’s take a look at a useful MongoDB Plugin for PyCharm for viewing collections.

MongoDB Plugin

While I will be discussing the Mongo Plugin specifically as it relates to PyCharm, the plugin itself works with the vast majority of IDEs provided by JetBrains. After downloading and installing the plugin we need to set a few things up. I’ll walk through setting up connections for a local installation of a MongoDB server as well as a connection to their Database as a Service, Atlas. For testing the connection we will want to make sure both of these servers are up and running.

MongoDB Plugin Settings

Local Server

For the local server, the settings are relatively straight forward. Assuming that we are working with a server on the default port of 27107, let’s take a look at our settings.

MongoDB Plugin - Initial Setup
File -> Settings -> Other Settings

We see here that there is a place to input the path to our Path to Mongo Shell. Be sure to put the location to the mongo executable and not the one for mongod. You can hit the test button next to the path name to make sure the plugin is happy with the correct file.

We next need to add a server to use and connect with. By clicking on the + symbol we are presented with an option to configure our server connection.

MongoDB Plugin - Localhost setup
Localhost setup configuration.

Here we see that we are able to label, or name, our connection and put in the server location in the format of host:port. For our example, we can use localhost:27017, as displayed above. For a single server without any authentication in place, these settings will connect to the database and you can see all of the databases on the server.

What if, however, you do have some authentication in place and want to establish a connection to a specific database? Let’s examine that with a connection to an Atlas configuration.

Atlas Server

We will need our Atlas connection URL that is available within our Atlas dashboard. Feel free to use your own server’s host name or IP address. For my server settings, I want to set a read preference for the Primary node and to connect to the travel collection in the database. I also selected that I’d like it to use SSL for the connection.

MongoDB Plugin - Atlas Connection
Connection to an Atlas database.

Since my Atlas server does require authentication, let’s take a look at that tab.

MongoDB Plugin Authentication
Setting up connection Authentication

We put in an appropriately established username and password along with the name of the authentication database. In this case, I am using the admin database. For Atlas, we want to choose the SCRAM-SHA-1 authorization mechanism. And then we can test this connection. If everything is configured correctly, we should get the good news pop up.

MongoDB Plugin Successful Configuration
Successful Connection Test.

Starting the Plugin

With our connections established, we can use the Mongo Explorer by navigating to View ->  Tool Windows -> Mongo Explorer. It will show our configured connections and when opening the connection up we see our databases listed.

MongoDB Plugin Enabling
Enabling the Plugin in PyCharm

Upon selecting a given database we are given a list of the collections. We can then choose a given collection and see a list of the documents in the collection.

MongoDB Plugin Examination
Examining a collection with the Mongo Plugin.

MongoDB Plugin ToolBar

If we have a look at the toolbar that appears above our collection:

MongoDB Plugin Toolbar callout

There are some great features in there.

MongoDB Plugin Toolbar

We see that we have a find option, an option to toggle the aggregation mode, and the ability to add and edit documents directly from PyCharm. We are given options to run queries with FilterProjection, and Sort parameters as well.  A group of very useful tools included with this plugin.

Wrap Up

With successfully configured connections to MongoDB servers, we can now utilize the Mongo Plugin to see what our data looks like as we develop. I personally find this to be a huge benefit and time saver when developing. If you use a JetBrains IDE for your development, I would highly encourage you to have a look at this very useful plugin.

There are several MongoDB specific terms in this post. I created a MongoDB Dictionary skill for the Amazon Echo line of products. Check it out and you can say “Alexa, ask MongoDB what is a document?” and get a helpful response.


Follow me on Twitter @kenwalger to get the latest updates on my postings.

Facebooktwitterredditlinkedinmail

MongoDB Performance Issues – Fact or Fiction

I see posts and hear conversations quite frequently about MongoDB performance issues. They are a frequent, and hot, topic on sites like Quora, Hacker News, and Reddit. Many of these “hits” against MongoDB are based on outdated data and older versions of MongoDB.

There was a movie from the late 1980’s called Crocodile Dundee II. There is a scene in the movie where the guy from the Australian Outback, Mick “Crocodile” Dundee, visits a New York City hotel for the first time. Someone shows him that the room has a television. He turns it on and sees an old episode of I Love Lucy. Dundee shuts it off claiming he has already experienced television.

Similarly, many complaints from older versions of MongoDB still linger around. Someone who had a bad experience with an old version will answer a thread somewhere and claim “I used it once, didn’t like it, it’s garbage.” Much like Mick Dundee, they are basing their entire opinion on outdated knowledge.

Let’s take a look at some performance issues that are often raised and where things sit now with the latest version of MongoDB, 3.4.6. I raised some of these aspects in a previous post, but let’s take a deeper dive.

The Jepsen Test & Performance Issues of old

From a “documented issue” standpoint, many performance issues that plague MongoDB in social reviews are covered in a Jepsen test result post from 20 April 2015. This was based on version 2.4.3. Or an even older article from 18 May 2013. Clearly, there were some issues with data scalability and data concurrency in those earlier versions.

In fact, Jepsen has done extensive tests on MongoDB on lost updates and dirty and stale reads. Without getting too deep into the hows and whys of what was happening to the data, there were issues with writes when a primary went down, and read & write consistency. These issues have been addressed as of version 3.4.1.

Product Enhancements

With the new data enhancements, MongoDB version 3.4.1 passed all of the Jepsen tests. Kyle Kingsbury, the creator of Jepsen, offered the following conclusions:

MongoDB has devoted significant resources to improved safety in the past two years, and much of that ground-work is paying off in 3.2 and 3.4.

MongoDB 3.4.1 (and the current development release, 3.5.1) currently pass all MongoDB Jepsen tests….These results hold during general network partitions, and the isolated & clock-skewed primary scenario.

You can read more about his conclusions in his published results.

Beyond data security, customers are finding huge benefits in performance in the more current releases of MongoDB. Improvements to, or the introduction of, technologies such as replication compression, the WiredTiger storage engine, in memory cache, and performance enhancements to sharding and replica sets have been a win for users.

WiredTiger Case Study

A friend who works at Wanderu.com, a MongoDB user, was very generous and forthcoming with some information about their MongoDB experience. When choosing a database option they felt that NoSQL, and MongoDB specifically, fit their business and data model better than a relational model would. They process a very diverse set of data for their bus and train travel application.

They take information from a vast assortment of bus and train vendors which arrive in XML, JSON, PDF, CSV, and other formats. Data is then ingested and transformed so that everything works with price checking and booking calls in vendor specific formats. The data model was determined to be incredibly complex and fragile for implementing in a relational database.

In May 2017, Wanderu migrated to the WiredTiger storage engine in MongoDB 3.4. They took some screenshots of some of their performance graphs. The graphs cover a 10 day period, five before and five after, their migration on 5/5. They were kind enough to share these images with me and approved of their use in this article.

Wanderu Charts
MognoDB Active Reads/Writes
Before WiredTiger, the write load had a very limited max. After migration writes spiked as necessary.
Queued Reads and Writes
Writes stayed fairly constant while queued reads fell dramatically.
MongoDB Index Size
Index size took a dramatic decrease in size as well.
MongoDB Memory Usage
Not surprisingly, Memory Usage dropped too.
MongoDB Page Faults
Page Fault improvements
Replication Lag
If there wasn’t a doubt, replication lag was improved as well.

In the four years since Wanderu launched, it has relied heavily on MongoDB. They store the station and trip information for each local, regional, and national carrier. With the new $graphLookup capability in MongoDB version 3.4, they are looking at the possibility of utilizing that technology for their graphing needs as well.

Further Industry Thoughts

MongoDB is a widely used NoSQL database. It is used by companies large and small, for a variety of reasons. I reached out to a few other known MongoDB users to get some real world feedback and product experiences.

Carfax

CARFAX, for example, has been using MongoDB in production since version 1.8. They load over a billion documents a year and generate over 20,000 reports per second. Jai Hirsch, a Senior Systems Architect at CARFAX, wrote a nice write up about why they decided on MongoDB. They have achieved some tremendous performance benefits from compressed replication.

GHX

GHX switched from MMAPv1 to WiredTiger with the 3.2 release of MongoDB. Jeff Sherard, their Database Engineering Manager, had another very positive experience.

Definitely the switch to WiredTiger in 3.2 was a huge boost. Especially on the compression side – we experience about 50% compression. Document level locking vs. Collection level locking also improved performance for us significantly.

He also experienced benefits with sharding and replica sets as well with an upgrade to 3.4.4.

We recently upgraded to 3.4.4 and are particularly pleased with the improvements in balancing on shards (the parallelism makes balancing really fast). And the initial sync improvements in replica sets [sic] have been really useful too.

Tinkoff Bank

Tinkoff Bank landed on using MongoDB instead of Oracle based on their finding that the performance of Oracle’s CLOBs are not as fast and are not searchable. They are able to process approximately 1,500 requests per second using their three node replica set. These queries put a load of 5-10% on the CPU of the primary node.

Wrap Up

I’m sure that the SQL vs. NoSQL debate will live on. Much the same as Windows vs. Mac, or cats vs. dogs. I hope, however, that based on the information and testimonies provided here we can lay to rest the notion that MongoDB isn’t “enterprise ready.” If we are going to argue the virtues of MongoDB, we should at least be talking about the most current version. As in the scene with Mick Dundee, he comes across looking foolish for basing his entire view of a product based on something he experienced years ago.

There are several MongoDB specific terms in this post. I created a MongoDB Dictionary skill for the Amazon Echo line of products. Check it out and you can say “Alexa, ask MongoDB what is a document?” and get a helpful response.


Follow me on Twitter @kenwalger to get the latest updates on my postings.

Facebooktwitterredditlinkedinmail