Data Project 2: Import JSON to MongoDB

In this data project you will write a single file of code to import a bundle of fifty (50) separate JSON files into a new collection within a MongoDB database. Each file contains one or more record.

The script you write should import every individual record when executed once. You cannot run the script 50 separate times to import all files and records.

The data for import can be found in the data/ directory of this repository.

Follow the steps below carefully and create a solution of your own.

1. Fork this Repository and open in Gitpod

Fork this repo so that you are working with your own copy of the code and can add, commit, and push freely. You will be submitting the URL to your fork for grading.

Open your repository in Gitpod by appending https://gitpod.io/# before the GitHub URL.

2. Choose Your Code

You are free to write this assignment in a bash script using a CLI tool called mongoimport, in Python using pymongo, or both.

3. Connecting to MongoDB Atlas

See this page in Canvas for how to set up connection credentials if you need them.

DO NOT commit the password to your GitHub repository!

For command-line access to MongoDB you have MONGO-ATLAS as an available command pre-built into your Gitpod configuration.

Python in Gitpod

If using Python in Gitpod you already have the MONGOPASS environment variable and can then connect to a db/collection in Atlas using this code:

from pymongo import MongoClient, errors
from bson.json_util import dumps
import os
import json

MONGOPASS = os.getenv('MONGOPASS')
uri = "mongodb+srv://cluster0.pnxzwgz.mongodb.net/"
client = MongoClient(uri, username='nmagee', password=MONGOPASS, connectTimeoutMS=200, retryWrites=True)
# specify a database
db = client.<db-name>
# specify a collection
collection = db.<collection-name>

pymongo documentation

`mongoimport` in Gitpod

If using mongoimport within a bash script in Gitpod, you should create a new Gitpod User Variable with your full URI (including username, password, host address, and DB name).

For instance, you could create a new variable named MONGODB with a value of mongodb+srv://nmagee:xxxxxxxxxxx@cluster0.pnxzwgz.mongodb.net/<YOUR-DB>

Note that you should replace the xxxxxx with the actual password, and replace <YOUR-DB> with the name of your MongoDB database. Your URI must specify a database at the end.

Then you can call this URI within a terminal command to connect automatically each time with each invocation like this:

mongoimport --uri $MONGODB --collection test  data.json --jsonArray

4. Listing Directory Contents

It is up to you to determine how you want to import the fifty files within data/. But here are some methods you might want to draw from:

`bash`

In bash you can traverse a directory's contents, item by item:

for file in data/*
do
  echo "$file"
done

This means that each file name, in the course of each for-loop, becomes a variable you can use or pass to other code or commands.

For instance, this version of the above code would pass the file name of each file into a separate Python script:

for file in data/*
do
  /usr/bin/python3 my-import-script.py $file
done

Python

In Python you can also traverse a directory's contents, item by item:

import os

path = "data"

for (root, dirs, file) in os.walk(path):
    for f in file:
        print(f)

5. Importing

`mongoimport` with `bash`

To import a single JSON file into MongoDB using mongoimport use this syntax:

mongoimport --uri $MONGODB --collection test data.json --jsonArray

Note these details:

The DB name should already be specified as part of the URI you created above.
The collection should be specified in the command. You can create a new collection by simply naming it here. This example imports into a collection mamed "test".
The data file itself is then passed.
Finally, you must indicate that you are passing in a --jsonAarray for the command to succeed.
mongoimport documentation

`pymongo` in Python

To import a single JSON file into MongoDB using pymongo in a Python script use this syntax:

# assuming you have defined a connection to your db and collection already:

# Loading or Opening the json file
with open('data.json') as file:
    file_data = json.load(file)
     
# Inserting the loaded data in the collection
# if JSON contains data more than one entry
# insert_many is used else insert_one is used
if isinstance(file_data, list):
    collection.insert_many(file_data)  
else:
    collection.insert_one(file_data)

pymongo documentation

6. Error Handling

You may encounter an error when importing this stack of files. Your code should be able to handle this without breaking. You do not need to log your errors but your code should continue processing additional records as much as possible.

Remember to assess where in your flow the errors occur, and handle them accordingly at each/any/every point with separate handlers as needed.

If you cannot elegantly handle the errors you may want to change your approach, or even your programming language.

7. Testing / Clearing Your Collection

As you test your code, MongoDB will not allow you to re-import files that have already been imported. To drop the collection to test again, open mongosh by typing MONGO-ATLAS in the Gitpod terminal.

Then:

use nem2p               # specify your database name
db.COLLECTION.drop()    # where COLLECTION is the name of your collection

8. Import Count

After successfully importing all complete records contained in all the files, determine how many:

complete documents have been imported into your collection
complete documents could not be imported
corrupted documents exist in the fileset

This file can be manually calculated and populated. Your code does need to output this file.

Note the number of records for each category in the file named count.txt and commit it to your repository.

9. Submit

Add, commit, and push your working script and the count.txt file to your fork of the repository.

Submit the GitHub URL of your fork for grading. This DP is worth 17 possible points.

Grading Rubric

Element	Points
Setup	1
Connection	2
Looping through files	3
Inserting into DB	4
Error Handling	4
Count accuracy	2

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
README.md		README.md
count.txt		count.txt
mongoimp		mongoimp
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Project 2: Import JSON to MongoDB

1. Fork this Repository and open in Gitpod

2. Choose Your Code

3. Connecting to MongoDB Atlas

Python in Gitpod

`mongoimport` in Gitpod

4. Listing Directory Contents

`bash`

Python

5. Importing

`mongoimport` with `bash`

`pymongo` in Python

6. Error Handling

7. Testing / Clearing Your Collection

8. Import Count

9. Submit

Grading Rubric

About

Uh oh!

Releases

Packages

Languages

anajonicholson/ds2002-dp2

Folders and files

Latest commit

History

Repository files navigation

Data Project 2: Import JSON to MongoDB

1. Fork this Repository and open in Gitpod

2. Choose Your Code

3. Connecting to MongoDB Atlas

Python in Gitpod

mongoimport in Gitpod

4. Listing Directory Contents

bash

Python

5. Importing

mongoimport with bash

pymongo in Python

6. Error Handling

7. Testing / Clearing Your Collection

8. Import Count

9. Submit

Grading Rubric

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`mongoimport` in Gitpod

`bash`

`mongoimport` with `bash`

`pymongo` in Python

Packages