____________________________________ / Hello Bootcampers! Welcome to this \ | Repo, are you ready to become Data | \ Engineer 8000 s?, digo, Mu / ------------------------------------ \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || ||
This repository contains the main procedure shown for the Wizeline DEB. This is a detailed guide to the first steps in the 101 trainee on Terraform.
. ├── README.org └─── data └── ds_salaries.csv 1 directory, 2 files
This repository only contains the main workflow in the README file, inside the /data
path there is only one file with salaries of Data Science.
- AWS account configured. For this example we are using default profile and set it in us-east-2 region
- aws cli already set up in your computer
- Terraform >= 0.13
Use the package manager homebrew to install the AWS CLI.
brew install awscli
After you’ve installed the AWS CLI, configure it by running aws configure
.
When prompted, enter your AWS Access Key ID, Secret Access Key, region and output format.
$ aws configure
AWS Access Key ID [None]: YOUR_AWS_ACCESS_KEY_ID
AWS Secret Access Key [None]: YOUR_AWS_SECRET_ACCESS_KEY
Default region name [None]: YOUR_AWS_REGION
Default output format [None]: json
If you don’t have an AWS Access Credentials, create your AWS Access Key ID and Secret Access Key by navigating to your service credentials in the IAM service on AWS. Click “Create access key” here and download the file. This file contains your access credentials.
Your default region can be found in the AWS Web Management Console beside your username. Select the region drop down to find the region name (eg. us-east-2) corresponding with your location.
- Clone this repository using the
git clone
command. - Let’s start runnning
touch my_first.tf
in your your terminal in order to create a file to puth all our first configuration inside. - Open your file and puth the header of the provider.
# Terraform Block terraform { required_providers { aws = { source = "hashicorp/aws" } } required_version = ">=0.12" } # Provider Block provider "aws" { profile = "default" region = "us-east-2" }
- Let’s start creating a bucket in S3.
# Resource Block - S3 Bucket resource "aws_s3_bucket" "b" { bucket = "deb-tf-bucket" acl = "private" }
- Deppending on what terraform version you are, you may retrive a message like this:
Warning: Argument is deprecated with aws_s3_bucket.b, on my_first.tf line 17, in resource "aws_s3_bucket" "b": 17: acl = "private" Use the aws_s3_bucket_acl resource instead
in case you have this warning please erase the acl line in the previous code block and add to the last setup this new code block:
# Set the acl resource "aws_s3_bucket_acl" "b_acl" { bucket = aws_s3_bucket.b.id acl = "private" }
- Now let’s uppload our first csv file to the recently created bucket, in order to do so, please copy the next code block in your main terraform file.
# Upload an object to S3-Bucket resource "aws_s3_bucket_object" "object" { bucket = aws_s3_bucket.b.id key = "my_first_upload.csv" source = "data/ds_salaries.csv" etag = filemd5("data/ds_salaries.csv") }
- After checking if your file was uploaded into your bucket, we can now move to te last part. Which is creating a redshift cluster adding the the next code block.
# Redshift resource block resource "aws_redshift_cluster" "example" { cluster_identifier = "tf-redshift-cluster" database_name = "mydebdb" master_username = "debuser" master_password = "Deb_2022" node_type = "dc2.large" cluster_type = "single-node" skip_final_snapshot = true }
Remember that you need to destroy everything after you finish your practice using:
terraform destroy
and you will retrieve something like this:
Destroy complete! Resources: 4 destroyed.
In case you are not sure that all of your resources were destroyed, you always can go to the console or main page of your cloud ( aws-console & gcp-console ) and check if all the resources where destroyed.
Now you know the basics about creating infrastructure in the Cloud, you created three main services to create your first pipeline, the next steps is loading the information uploaded to our S3 Bucket into our Redshift DW. There are several ways to do it, the most basic one is using the AWS provided UI settings all the paremeter by hand, this can be make it following this guide provided by AWS.
In case you want go deeper with this practice, you can add resources like a lambda function
, triggering an action when a file is uploaded into a S3 bucket, this task can also by provided by other orchestration tools such as airflow
which is part of the next comming sessions.
Some reference links: