A simple etl pipeline using Apache Cassandra
Running this project will require
- python 3
- jupyter lab
- docker
Get the cassandra docker image
docker pull cassandra:3.11.4
Create the docker container, running dettached, providing an available memory limit, and exposing the appropriate port
docker run --memory 4g --name cass-serv -p 9042:9042 -d cassandra:3.11.4
This creates and runs the container the first time, each additional time you need to start the container, you can use the docker start
command:
docker start cass-serv
and stop the container using docker stop cass-serv
.
Run pip install -r requirements.txt
to download libraries required in the jupyter notebook.
To start the notebook. Run jupyter lab
, a browser should start in your current directory and you should be able to interact with the cassandra server running in the docker container.