Insurance Model Training and Deployment Pipeline

Project Overview

This project encompasses a complete machine learning workflow for an insurance dataset. The workflow includes data loading, cleaning, preprocessing, model training, deployment, and monitoring. The project leverages several tools and technologies to ensure best practices are followed, including MLflow for experiment tracking, Prefect for workflow orchestration, Flask and Docker for deployment, and Evidently for monitoring.

Technologies Used

Cloud: Can be deployed on AWS, GCP, Azure, or any cloud platform.
Experiment Tracking: MLflow
Workflow Orchestration: Prefect
Monitoring: Evidently
CI/CD: GitHub Actions
Infrastructure as Code (IaC): Terraform (optional for cloud resource provisioning)

Setup Instructions

1. Clone the Repository

git clone https://github.com/manova01/insurance_pro.git
cd insurance_pro

2. Create and Activate a Virtual Environment

python -m venv venv
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Set Up MLflow Tracking

Ensure that MLflow is installed and running:

pip install mlflow
mlflow ui

5. Configure Prefect

Install Prefect:

pip install prefect

Start the Prefect server:

prefect server start

Configure Prefect to communicate with the server:

prefect config set PREFECT_API_URL=http://127.0.0.1:4200/api

6. Create Prefect Flow

The Prefect flow script model_training_flow.py is already included in the repository. It orchestrates the entire model training pipeline.

7. Flask Application for Model Deployment

The Flask app provides a REST API for model predictions.

Create a Dockerfile to containerize the Flask application:

FROM python:3.8-slim

WORKDIR /app

COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

COPY . .

CMD ["python", "app.py"]

Build and run the Docker container:

docker build -t insurance-model .
docker run -p 5000:5000 insurance-model

8. Model Monitoring with Evidently

Create a monitoring script monitor.py to track model performance:

from evidently.dashboard import Dashboard
from evidently.dashboard.tabs import DataDriftTab, RegressionPerformanceTab
import pandas as pd
import joblib
from sklearn.model_selection import train_test_split

# Load data
url = 'https://raw.githubusercontent.com/manova01/insurance_pro/main/insurance%20data.csv'
df = pd.read_csv(url)

# Data preprocessing
df.columns = df.columns.str.lower().str.replace(' ', '_')
df = df.dropna()

# Split the data
df_full_train, df_test = train_test_split(df, test_size=0.2, random_state=42)
df_train, df_val = train_test_split(df_full_train, test_size=0.25, random_state=42)

# Load model
model = joblib.load('model.pkl')

# Make predictions
X_test = df_test.drop(columns='charges')
y_test = df_test['charges']
y_pred = model.predict(X_test)

# Create the dashboard
dashboard = Dashboard(tabs=[DataDriftTab(), RegressionPerformanceTab()])
dashboard.calculate(df_train, df_test)

# Save the dashboard
dashboard.save("evidently_dashboard.html")

9. CI/CD with GitHub Actions

Set up a GitHub Actions workflow to automate testing and deployment. Create a .github/workflows/main.yml file:

name: CI/CD Pipeline

on:
  push:
    branches:
      - main

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
    - name: Checkout code
      uses: actions/checkout@v2

    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: 3.8

    - name: Install dependencies
      run: |
        python -m venv venv
        source venv/bin/activate
        pip install -r requirements.txt

    - name: Run tests
      run: |
        source venv/bin/activate
        pytest

    - name: Build and push Docker image
      uses: docker/build-push-action@v2
      with:
        context: .
        push: true
        tags: username/insurance-model:latest

Replace username/insurance-model with your Docker Hub repository name.

Usage

Model Training Pipeline

Start the Prefect server.
Register the Prefect flow:
```
python model_training_flow.py
```

Run the flow:

prefect deployment run model_training_pipeline

Model Deployment

Build and run the Docker container for the Flask app:

docker build -t insurance-model .
docker run -p 5000:5000 insurance-model

Make a prediction by sending a POST request to http://localhost:5000/predict with a JSON payload.

Monitoring

Run the monitoring script to generate a performance dashboard:

python monitor.py

CI/CD

Push changes to the main branch to trigger the GitHub Actions workflow.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request.

License

This project is licensed under the MIT License.

By following this README, you should be able to set up and run the entire machine learning workflow for the insurance dataset, ensuring best practices and leveraging modern tools and technologies.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
mlruns		mlruns
myenv		myenv
.gitignore		.gitignore
.prefectignore		.prefectignore
Dockerfile		Dockerfile
README.md		README.md
actual_vs_predicted.png		actual_vs_predicted.png
app.py		app.py
conftest.py		conftest.py
data_drift_report.html		data_drift_report.html
insurance data.csv		insurance data.csv
insurance_charge.ipynb		insurance_charge.ipynb
insurance_mlflow.py		insurance_mlflow.py
insurance_training_pipeline.py		insurance_training_pipeline.py
linear_regression_model.pkl		linear_regression_model.pkl
model_performance_report.html		model_performance_report.html
requirements.txt		requirements.txt
residuals_distribution.png		residuals_distribution.png
ridge_model.pkl		ridge_model.pkl
run_flow.py		run_flow.py
test_app.py		test_app.py
test_model.py		test_model.py
track.py		track.py
train.py		train.py
train_lin.py		train_lin.py
training.py		training.py
vectorizer.pkl		vectorizer.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Insurance Model Training and Deployment Pipeline

Project Overview

Technologies Used

Setup Instructions

1. Clone the Repository

2. Create and Activate a Virtual Environment

3. Install Dependencies

4. Set Up MLflow Tracking

5. Configure Prefect

6. Create Prefect Flow

7. Flask Application for Model Deployment

8. Model Monitoring with Evidently

9. CI/CD with GitHub Actions

Usage

Model Training Pipeline

Model Deployment

Monitoring

CI/CD

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

manova01/insured_project

Folders and files

Latest commit

History

Repository files navigation

Insurance Model Training and Deployment Pipeline

Project Overview

Technologies Used

Setup Instructions

1. Clone the Repository

2. Create and Activate a Virtual Environment

3. Install Dependencies

4. Set Up MLflow Tracking

5. Configure Prefect

6. Create Prefect Flow

7. Flask Application for Model Deployment

8. Model Monitoring with Evidently

9. CI/CD with GitHub Actions

Usage

Model Training Pipeline

Model Deployment

Monitoring

CI/CD

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages