(Image created using ChatGPT)
The word GaiaFlow
is a combination of Gaia
(the Greek goddess of Earth, sy
8000
mbolizing our planet)
and Flow
(representing seamless workflows in MLOps). It is an MLOps
framework tailored for efficient Earth Observation projects. GaiaFlow is built
to provide you with a framework for the entire pipeline of remote sensing applications, from data
ingestion to machine learning modeling to deploying them.
It is a comprehensive template for machine learning projects
providing a MLOps framework with tools like Airflow
, MLFlow
,
JupyterLab
, Minio
and Minikube
to allow the user to create ML projects,
experiments, model deployments and more in an standardized way. The documentation
is available here
The architecture below describes what we want to achieve as our MLOps framework. This is taken from the Google Cloud Architecture Centre
Currently what we support is the shown as green ticks.
Please note: This framework has only been tested on Linux Ubuntu and Windows 11 and it works as expected. As we have not tested it yet on MacOS, we are not sure if it works in there.
- Overview
- Project Structure from this template.
- ML Pipeline Overview
- Getting Started
- Troubleshooting
- Acknowledgments
This template provides a standardized project structure for ML initiatives at BC, integrating essential MLOps tools:
- Apache Airflow: For orchestrating ML pipelines and workflows
- MLflow: For experiment tracking and model registry
- JupyterLab: For interactive development and experimentation
- MinIO: For local object storage for ML artifacts
- Minikube: For local lightweight Kubernetes cluster
You will get the following project when you use this template to get started with your ML project.
Any files or folders marked with *
are off-limits—no need to change, modify,
or even worry about them. Just focus on the ones without the mark!
Any files or folders marked with ^
can be extended, but carefully.
├── .github/ # GitHub Actions workflows (you are provided with a starter CI)
├── dags/ # Airflow DAG definitions
│ (you can either define dags using a config-file (dag-factory)
│ or use Python scripts.)
├── notebooks/ # JupyterLab notebooks
├── your_package/
│ │ (For new projects, it would be good to follow this standardized folder structure.
│ │ You are of course allowed to add anything you like to it.)
│ ├── dataloader/ # Your Data loading scripts
│ ├── train/ # Your Model training scripts
│ ├── preprocess/ # Your Feature engineering/preprocessing scripts
│ ├── postprocess/ # Your Postprocessing model output scripts
│ ├── model/ # Your Model defintion
│ ├── model_pipeline/ # Your Model Pipeline to be used for inference
│ └── utils/ # Utility functions
├── tests/ # Unit and integration tests
├── data/ # If you have data locally, move it here and use it so that airflow has access to it.
├── README.md # Its a readme. Feel to change it!
├── CHANGES.md # You put your changelog for every version here.
├── pyproject.toml # Config file containing your package's build information and its metadata
├── .env * ^ # Your environment variables that docker compose and python scripts can use (already added to .gitignore)
├── .gitignore * ^ # Files to ignore when pushing to git.
├── environment.yml # Libraries required for local mlops and your project
├── mlops_manager.py * # Manager to manage the mlops services locally
├── minikube_manager.py *# Manager to manage the kubernetes cluster locally
├── docker-compose.yml * # Docker compose that spins up all services locally for MLOps
├── utils.py * # Utility function to get the minikube gateway IP required for testing.
├── docker_config.py * # Utility function to get the docker image name based on your project.
├── kube_config_inline * # This file is needed for Airflow to communicate with Minikube when testing locally in a prod env.
├── airflow_test.cfg * # This file is needed for testing your airflow dags.
├── Dockerfile ^ # Dockerfile for your package.
└── dockerfiles/ * # Dockerfiles required by Docker compose
Before you get started, let's explore the tools that we are using for this standardized MLOps framework
Purpose: Project scaffolding and template generation
- Provides a standardized way to create ML projects with predefined structures.
- Ensures consistency across different ML projects within BC
Purpose: Workflow orchestration
- Manages and schedules data pipelines.
- Automates end-to-end ML workflows, including data ingestion, training, deployment and re-training.
- Provides a user-friendly web interface for tracking task execution's status.
airflow.mp4
- DAGs (Directed Acyclic Graphs): A workflow representation in Airflow. You can enable, disable, and trigger DAGs from the UI.
- Graph View: Visual representation of task dependencies.
- Tree View: Displays DAG execution history over time.
- Task Instance: A single execution of a task in a DAG.
- Logs: Each task's execution details and errors.
- Code View: Shows the Python code of a DAG.
- Trigger DAG: Manually start a DAG run.
- Pause DAG: Stops automatic DAG execution.
Common Actions
- Enable a DAG: Toggle the On/Off button.
- Manually trigger a DAG: Click Trigger DAG
▶️ . - View logs: Click on a task instance and select Logs.
- Restart a failed task: Click Clear to rerun a specific task.
Purpose: Experiment tracking and model management
- Tracks and records machine learning experiments, including hyperparameters, performance metrics, and model artifacts.
- Facilitates model versioning and reproducibility.
- Supports multiple deployment targets, including cloud platforms, Kubernetes, and on-premises environments.
mlflow.mp4
- Experiments: Group of runs tracking different versions of ML models.
- Runs: A single execution of an ML experiment with logged parameters, metrics, and artifacts.
- Parameters: Hyperparameters or inputs logged during training.
- Metrics: Performance indicators like accuracy or loss.
- Artifacts: Files such as models, logs, or plots.
- Model Registry: Centralized storage for trained models with versioning.
Common Actions
- View experiment runs: Go to Experiments > Select an experiment
- Compare runs: Select multiple runs and click Compare.
- View parameters and metrics: Click on a run to see details.
- View registered model: Under Artifacts, select a model and click Register Model.
Purpose: Interactive development environment
- Provides an intuitive and interactive web-based interface for exploratory data analysis, visualization, and model development.
Purpose: Object storage for ML artifacts
- Acts as a cloud-native storage solution for datasets and models.
- Provides an S3-compatible API for seamless integration with ML tools.
Purpose: Local Kubernetes cluster for development & testing
- Allows you to run a single-node Kubernetes cluster locally.
- Simulates a production-like environment to test Airflow DAGs end-to-end.
- Great for validating KubernetesExecutor, and Dockerized task behavior before deploying to a real cluster.
- Mimics production deployment without the cost or risk of real cloud infrastructure.
Please make sure that you install the following from the links provided as they have been tried and tested.
If you face any issues, please check out the troubleshooting section
- Docker and Docker Compose
- Mamba - Please make sure you
install
Python 3.12
as this repository has been tested with that version. - Minikube on Linux
- Minikube on Windows
For Linux users: please follow the steps mentioned in this link
For Windows users: please follow the steps mentioned in this link 84F4
This should install both Docker and Docker compose plugin. You can verify the installation by these commands
docker --version
docker compose version
and output would be something like:
Docker version 27.5.1, build 9f9e405
Docker Compose version v2.32.4
This means now you have successfully installed Docker.
Once the pre-requisites are done, you can go ahead with the project creation:
- Create a separate environment for cookiecutter
mamba create -n cc cookiecutter ruamel.yaml
mamba activate cc
- Generate the project from template:
cookiecutter https://github.com/bcdev/gaiaflow
When prompted for input, enter the details requested. If you dont provide any input for a given choice, the first choice from the list is taken as the default.
Once the project is created, please read the user guide.
-
If you are windows, please use the
miniforge prompt
commandline. -
If you face issue like
Docker Daemon not started
, start it using:
sudo systemctl start docker
and try the docker commands again in a new terminal.
- If you face an issue as follows:
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock:
, do the following
sudo chmod 666 /var/run/docker.sock
and try the docker commands again in a new terminal.
- If you face an issue like
Cannot connect to the Docker daemon at unix:///home//.docker/desktop/docker.sock. Is the docker daemon running?
, it is likely because of you have two contexts of docker running.
To view the docker contexts,
docker context ls
This will show the list of docker contexts. Check if default is enabled (it should have a * beside it) If not, you might probably have desktop as your context enabled. To confirm which context you are in:
docker context show
To use the default context, do this:
docker context use default
Check for the following file:
cat ~/.docker/config.json
If it is empty, all good, if not, it might be something like this:
{
"auths": {},
"credsStore": "desktop"
}
Completely move this file away from this location or delete it and try running docker again.
- If you face some permissions issues on some files like
Permission Denied
, as a workaround, please use this and let us know so that we can update this repo.
sudo chmod 666 <your-filename>
If you face any other problems not mentioned above, please reach out to us.
Make ECR work. How to add credentials?
S3 credentials access?
Add sensor based DAGs
Update CI to use ECR credentials.
How to share Secrets?
Monitoring Airflow
MLFlow Deployment