{"id":387,"date":"2017-06-19T06:38:27","date_gmt":"2017-06-19T13:38:27","guid":{"rendered":"http:\/\/www.kenwalger.com\/blog\/?p=387"},"modified":"2017-10-16T16:58:48","modified_gmt":"2017-10-16T23:58:48","slug":"importing-data-mongoimport","status":"publish","type":"post","link":"https:\/\/www.kenwalger.com\/blog\/nosql\/mongodb\/importing-data-mongoimport\/","title":{"rendered":"Importing data with mongoimport"},"content":{"rendered":"<p>There comes a time in almost everyone&#8217;s experience with databases when it would be great to bring in data from an outside source. Often the data is in a spreadsheet format (CSV or TSV) or perhaps a JSON format. I discussed some of the command line tools <a href=\"https:\/\/www.mongodb.com\">MongoDB<\/a> provides in a <a href=\"http:\/\/www.kenwalger.com\/blog\/nosql\/mongodb-cli-tools\/\">previous post<\/a>. Importing data into a MongoDB database is made easy with the CLI tool, mongoimport.<\/p>\n<p>For many use cases, mongoimport is pretty straight forward. It is, in fact, highly used in the <a href=\"https:\/\/university.mongodb.com\/https:\/\/university.mongodb.com\/\">MongoDB University<\/a> courses as a way to quickly populate a database, for example. I&#8217;d like to look at some use cases beyond simply populating an empty <a href=\"https:\/\/docs.mongodb.com\/manual\/core\/databases-and-collections\/\">collection<\/a>, however.<\/p>\n<h3>mongoimport<\/h3>\n<h5>Connections<\/h5>\n<p>The mongoimport will connect to a running <a href=\"https:\/\/docs.mongodb.com\/manual\/reference\/program\/mongod\/\">mongod<\/a> or <a href=\"https:\/\/docs.mongodb.com\/manual\/reference\/program\/mongos\/\">mongos<\/a>, instance running, by default, on port 27017 on localhost. The syntax of the mongoimport command is fairly straightforward. If for example, we want to populate the <code>posts<\/code> collection in the <code>blog<\/code> database with a <code>posts.json<\/code> file it is simple enough to run the following command.<\/p>\n<pre>mongoimport --db blog --collection posts --file posts.json\n<\/pre>\n<p>That is pretty easy. We can make it easier too by using the shorthand version of those flags.<\/p>\n<pre>mongoimport -d blog -c posts --file posts.json\n<\/pre>\n<p>If we want to make sure that our <code>posts<\/code> collection is dropped and only the new data is there, we can use the <code>--drop<\/code> flag.<\/p>\n<pre>mongoimport -d blog -c posts --drop --file posts.json\n<\/pre>\n<p>If you need to change the host or port number, there are flags for that as well, <code>--host<\/code> and <code>--port<\/code>, respectively. <code>--host<\/code> is even more convenient because it allows you to add the port at the end, and use a shorter flag <code>-h<\/code>. So the following are the same:<\/p>\n<pre>mongoimport --host 123.123.123.1 --port 1234 -d blog -c posts --file posts.json\n<\/pre>\n<pre>mongoimport -h 123.123.123.1:1234 -d blog -c posts --file posts.json\n<\/pre>\n<p>That&#8217;s easy enough as well. What if, however, our MongoDB server requires user authentication, like any good server should?<\/p>\n<h5>mongoimport Server Authentication<\/h5>\n<p><a href=\"https:\/\/www.amazon.com\/gp\/product\/1491937017\/ref=as_li_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1491937017&amp;linkCode=as2&amp;tag=kenwalgersite-20&amp;linkId=b9276793e1f3917877feace05ed4e90f\" target=\"_blank\" rel=\"noopener\">Data security<\/a><img loading=\"lazy\" decoding=\"async\" style=\"border: none !important; margin: 0px !important;\" src=\"\/\/ir-na.amazon-adsystem.com\/e\/ir?t=kenwalgersite-20&amp;l=am2&amp;o=1&amp;a=1491937017\" alt=\"\" width=\"1\" height=\"1\" border=\"0\" \/> should be on everyone&#8217;s mind when it comes to server management. With that in mind, MongoDB offers a variety of ways to secure your data. Assuming that one needs to get authenticated access to the server, how can one use mongoimport to do so?\u00a0You guessed it, there are flags for that too. <code>--username<\/code>\u00a0or <code>-u<\/code>\u00a0and <code>--password<\/code>\u00a0or <code>-p<\/code> are your friends.<\/p>\n<pre>mongoimport -h 123.123.123.1:1234 -u user -p \"pass\" -d blog -c posts --file posts.json\n<\/pre>\n<p>We can add in some extra assurances by leaving off the <code>--password<\/code> flag and mongoimport will prompt for an appropriate password.<\/p>\n<p>That works great for some simpler\u00a0authentication options, but what if we have a more involved authentication system with an\u00a0<a href=\"https:\/\/docs.mongodb.com\/manual\/core\/security-users\/#user-authentication-database\">authentication database<\/a>? We can specify one with the <code>--authenticationDatabase<\/code> flag. That&#8217;s pretty handy to keep only authorized people from importing data into your collection.<\/p>\n<p>mongoimport provides a great range of flag options for connecting to secured servers. I would highly recommend looking at the <a href=\"https:\/\/docs.mongodb.com\/manual\/reference\/program\/mongoimport\/\">documentation<\/a> for specifics based on your environment.<\/p>\n<h3>File &amp; Column Types<\/h3>\n<p>As stated earlier, mongoimport works on CSV, TSV, and JSON documents. By default the import format is JSON. With the <code>--type<\/code> flag we can import CSV or TSV files. Since CSV and TSV files can contain some special features, let&#8217;s look at some of the options for working with them and mongoimport.<\/p>\n<p>Many times a CSV or TSV file will include a header line. It would be handy if we could utilize those header values as field names in our MongoDB <a href=\"https:\/\/docs.mongodb.com\/manual\/core\/document\/\">documents<\/a>, right? Well, mongoimport offers a <code>--headerline<\/code> flag that accomplishes that for us.<\/p>\n<p>For times in which our CSV or TSV file\u00a0<em>doesn&#8217;t<\/em> include header information, mongoimport has a solution for that as well. With the <code>--fields<\/code> flag, one can provide a comma-separated list of field names. Alternatively, you can generate a file of field names, with one name per line, and pass it along with the <code>--fieldFile<\/code>\u00a0flag.<\/p>\n<p>Along with some of the <a href=\"http:\/\/www.kenwalger.com\/blog\/nosql\/new-version-new-features\/\">other features<\/a> new to MongoDB version 3.4, there are some new features added to mongoimport. One of them is the<br \/>\n<code>--columnsHaveTypes<\/code> flag. When used in conjunction with the <code>--fields<\/code>, \u00a0<code>--fieldFile<\/code>, or <code>--headerline<\/code>\u00a0flag it allows you to specify the types of each field. You pass in the field name in the format of <code>columnName.type()<\/code> along with any arguments into the <code>type()<\/code> method. So, for example, if you were typing a filed called <code>isAdmin<\/code> you would use <code>isAdmin.bool()<\/code>.\u00a0Have a look at the <code>--columnsHaveTypes<\/code> <a href=\"https:\/\/docs.mongodb.com\/manual\/reference\/program\/mongoimport\/#cmdoption-columnshavetypes\">documentation<\/a> for a list of available types and supported arguments.<\/p>\n<p>Fair warning here, the flags dealing with header information are for CSV and\/or TSV files. If one attempts to use them with a JSON formatted file, mongoimport gets grumpy and returns an error.<\/p>\n<h3>Importing into an existing collection<\/h3>\n<p>One last concept and list of flags I&#8217;d like to cover is for those instances in which you want to import data into an existing collection. The<br \/>\n<code>--mode<\/code>\u00a0flag offers a way to tell mongoimport how to handle existing collection documents which match incoming ones. There are three options to the <code>--mode<\/code> flag, <strong>insert<\/strong>, <strong>upsert<\/strong>, and <strong>merge<\/strong>.<\/p>\n<ul>\n<li>Insert allows the documents to get put into the collection with the only check being on fields with a unique index. If there are duplicate values, mongoimport logs an error.<\/li>\n<li>Upsert replaces documents in the database with the\u00a0<em>new<\/em> documents from the import file. All other documents get inserted.<\/li>\n<li>Merge, well, it merges existing documents with matching incoming documents and inserts the others. This is another new feature of version 3.4.<\/li>\n<\/ul>\n<p>If you are needing to import documents in one of these ways, look at the <a href=\"https:\/\/docs.mongodb.com\/manual\/reference\/program\/mongoimport\/#merge-matching-documents-during-import\">documentation<\/a> for options on upserting and merging based on field other than <code>_id<\/code>.<\/p>\n<h3>Wrap Up<\/h3>\n<p>MongoDB also provides a similar, but inverse, function mongoexport. While both tools are powerful they do not preseve the BSON data types than MongoDB uses. As such, these tools should\u00a0<strong>not<\/strong> be used for production backups. MongoDB provides <a href=\"https:\/\/docs.mongodb.com\/manual\/core\/backups\/\">other tools<\/a> for backup methods.<\/p>\n<p>I hope that this post has given you some insights into one of the powerful MongoDB Package Component tools that are provided &#8220;out of the box&#8221;, mongoimport. Some programming languages have developed their own separate tools for importing data. Some of them are better than others. For me, since such a powerful import tool is already provided, I find myself using mongoimport more often than not.<\/p>\n<p>If you haven&#8217;t tried it out yet yourself, I would encourage you to do so.<\/p>\n<p>There are several MongoDB specific terms in this post. I created a <a href=\"https:\/\/www.echoskillstore.com\/MongoDB-Dictionary\/45103\">MongoDB Dictionary<\/a> skill for the <a href=\"https:\/\/www.amazon.com\/gp\/product\/B01DFKC2SO\/ref=as_li_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=B01DFKC2SO&amp;linkCode=as2&amp;tag=kenwalgersite-20&amp;linkId=f9e513223de2525a72b95cf9561db55b\" rel=\"noopener noreferrer\">Amazon Echo<\/a>\u00a0line of products. Check it out and you can say &#8220;Alexa, ask MongoDB for the definition of authentication?&#8221; and get a helpful response.<\/p>\n<hr \/>\n<p><em>Follow me on Twitter <a href=\"https:\/\/www.twitter.com\/kenwalger\">@kenwalger<\/a> to get the latest updates on my postings.<\/em><\/p>\n<a class=\"synved-social-button synved-social-button-share synved-social-size-48 synved-social-resolution-single synved-social-provider-facebook nolightbox\" data-provider=\"facebook\" target=\"_blank\" rel=\"nofollow\" title=\"Share on Facebook\" href=\"https:\/\/www.facebook.com\/sharer.php?u=https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-json%2Fwp%2Fv2%2Fposts%2F387&amp;t=Importing%20data%20with%20mongoimport&amp;s=100&amp;p[url]=https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-json%2Fwp%2Fv2%2Fposts%2F387&amp;p[images][0]=https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-content%2Fuploads%2F2017%2F06%2Fmongoimport.png&amp;p[title]=Importing%20data%20with%20mongoimport\" style=\"font-size: 0px;width:48px;height:48px;margin:0;margin-bottom:5px;margin-right:5px\"><img loading=\"lazy\" decoding=\"async\" alt=\"Facebook\" title=\"Share on Facebook\" class=\"synved-share-image synved-social-image synved-social-image-share\" width=\"48\" height=\"48\" style=\"display: inline;width:48px;height:48px;margin: 0;padding: 0;border: none;box-shadow: none\" src=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/plugins\/social-media-feather\/synved-social\/image\/social\/regular\/96x96\/facebook.png\" \/><\/a><a class=\"synved-social-button synved-social-button-share synved-social-size-48 synved-social-resolution-single synved-social-provider-twitter nolightbox\" data-provider=\"twitter\" target=\"_blank\" rel=\"nofollow\" title=\"Share on Twitter\" href=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-json%2Fwp%2Fv2%2Fposts%2F387&amp;text=Hey%20check%20this%20out\" style=\"font-size: 0px;width:48px;height:48px;margin:0;margin-bottom:5px;margin-right:5px\"><img loading=\"lazy\" decoding=\"async\" alt=\"twitter\" title=\"Share on Twitter\" class=\"synved-share-image synved-social-image synved-social-image-share\" width=\"48\" height=\"48\" style=\"display: inline;width:48px;height:48px;margin: 0;padding: 0;border: none;box-shadow: none\" src=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/plugins\/social-media-feather\/synved-social\/image\/social\/regular\/96x96\/twitter.png\" \/><\/a><a class=\"synved-social-button synved-social-button-share synved-social-size-48 synved-social-resolution-single synved-social-provider-reddit nolightbox\" data-provider=\"reddit\" target=\"_blank\" rel=\"nofollow\" title=\"Share on Reddit\" href=\"https:\/\/www.reddit.com\/submit?url=https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-json%2Fwp%2Fv2%2Fposts%2F387&amp;title=Importing%20data%20with%20mongoimport\" style=\"font-size: 0px;width:48px;height:48px;margin:0;margin-bottom:5px;margin-right:5px\"><img loading=\"lazy\" decoding=\"async\" alt=\"reddit\" title=\"Share on Reddit\" class=\"synved-share-image synved-social-image synved-social-image-share\" width=\"48\" height=\"48\" style=\"display: inline;width:48px;height:48px;margin: 0;padding: 0;border: none;box-shadow: none\" src=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/plugins\/social-media-feather\/synved-social\/image\/social\/regular\/96x96\/reddit.png\" \/><\/a><a class=\"synved-social-button synved-social-button-share synved-social-size-48 synved-social-resolution-single synved-social-provider-linkedin nolightbox\" data-provider=\"linkedin\" target=\"_blank\" rel=\"nofollow\" title=\"Share on Linkedin\" href=\"https:\/\/www.linkedin.com\/shareArticle?mini=true&amp;url=https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-json%2Fwp%2Fv2%2Fposts%2F387&amp;title=Importing%20data%20with%20mongoimport\" style=\"font-size: 0px;width:48px;height:48px;margin:0;margin-bottom:5px;margin-right:5px\"><img loading=\"lazy\" decoding=\"async\" alt=\"linkedin\" title=\"Share on Linkedin\" class=\"synved-share-image synved-social-image synved-social-image-share\" width=\"48\" height=\"48\" style=\"display: inline;width:48px;height:48px;margin: 0;padding: 0;border: none;box-shadow: none\" src=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/plugins\/social-media-feather\/synved-social\/image\/social\/regular\/96x96\/linkedin.png\" \/><\/a><a class=\"synved-social-button synved-social-button-share synved-social-size-48 synved-social-resolution-single synved-social-provider-mail nolightbox\" data-provider=\"mail\" rel=\"nofollow\" title=\"Share by email\" href=\"mailto:?subject=Importing%20data%20with%20mongoimport&amp;body=Hey%20check%20this%20out:%20https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-json%2Fwp%2Fv2%2Fposts%2F387\" style=\"font-size: 0px;width:48px;height:48px;margin:0;margin-bottom:5px\"><img loading=\"lazy\" decoding=\"async\" alt=\"mail\" title=\"Share by email\" class=\"synved-share-image synved-social-image synved-social-image-share\" width=\"48\" height=\"48\" style=\"display: inline;width:48px;height:48px;margin: 0;padding: 0;border: none;box-shadow: none\" src=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/plugins\/social-media-feather\/synved-social\/image\/social\/regular\/96x96\/mail.png\" \/><\/a>","protected":false},"excerpt":{"rendered":"<p>There comes a time in almost everyone&#8217;s experience with databases when it would be great to bring in data from an outside source. Often the data is in a spreadsheet format (CSV or TSV) or perhaps a JSON format. I discussed some of the command line tools MongoDB provides in a previous post. Importing data &hellip; <a href=\"https:\/\/www.kenwalger.com\/blog\/nosql\/mongodb\/importing-data-mongoimport\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Importing data with mongoimport&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":388,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"pmpro_default_level":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[4],"tags":[66,68,65,67],"yst_prominent_words":[453,441,440,436,450,445,439,263,449,451,442,438,437,447,448,446,444,454,452,443],"class_list":["post-387","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-mongodb","tag-csv","tag-json","tag-mongoimport","tag-tsv","pmpro-has-access"],"jetpack_featured_media_url":"https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2017\/06\/mongoimport.png","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8lx70-6f","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/posts\/387","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/comments?post=387"}],"version-history":[{"count":8,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/posts\/387\/revisions"}],"predecessor-version":[{"id":651,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/posts\/387\/revisions\/651"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/media\/388"}],"wp:attachment":[{"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/media?parent=387"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/categories?post=387"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/tags?post=387"},{"taxonomy":"yst_prominent_words","embeddable":true,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/yst_prominent_words?post=387"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}