Here’s the updated README with a Project Structure section added under the Tech Stack for clarity:
Overview
An end-to-end machine learning system for personalized movie recommendations, powered by a content-based algorithm built with TensorFlow/Keras. The project leverages modern MLOps tools like ZenML, MLflow, Evidently, and BentoML for robust pipeline orchestration, validation, and deployment.
- TensorFlow/Keras Content-Based Algorithm: Neural networks that analyze user preferences and movie metadata to predict ratings with surgical precision.
- ZenML Orchestration: Unifies the ML workfl 8000 ow with automated pipelines, artifact tracking, and tool integrations.
- Evidently: Validates data quality and model performance in production-like scenarios.
- MLflow: Tracks experiments, logs metrics, and manages model registry.
- BentoML: Streamlines cloud deployment with containerized model serving.
-
Data Ingestion:
- Collects movie ratings and metadata from CSV datasets.
-
Data Validation:
- Schema Validation: Ensures CSV files match predefined schemas (column names, data types).
- Data Range Check: Validates ratings fall within 0.5–5.0 to prevent target leakage.
-
Data Processing:
- User Feature extraction: Extract user feature based preferences and past ratings.
- Movie Feature extraction: Extract features from movie based on Tags, Genre and past Ratings.
-
Model Training:
-
Model Evaluation:
- Computes accuracy, RMSE, and user-specific ranking metrics.
-
Model Registry:
- Version-controlled model storage in MLflow.
-
BentoML Promotion:
- Packages validated models for cloud deployment.
- ML Framework: TensorFlow 2.x, Keras
- MLOps: ZenML, MLflow (experiment tracking/model registry), Evidently (validation), BentoML (deployment)
- Data Tools: Pandas, NumPy, Scikit-learn
END-END-MLFLOW/
├── artfacts/
├── logs/
├── config/
│ ├──config.yaml # Project configuration for each pipeline step
├── logs/
├── mlruns/
├── research/
├── src/
│ ├──mlProject/
│ ├── steps/ # ZenML steps pipeline definition
│ │ ├── step_01_data_loader.py
│ │ ├── step_02_data_validation.py
│ │ ├── step_03_data_processing.py
│ │ ├── step_04_model_trainer.py
│ │ ├── step_05_model_evaluation.py
│ │ └── step_06_model_promotion.py
│ │
│ ├── pipelines/ # ZenML pipeline definition
│ │ ├── training_pipeline.py
│ │ └── ...
│ ├── components/
│ │ ├── data_loader.py
│ │ ├── data_validation.py
│ │ ├── data_processing.py
│ │ ├── model_trainer.py
│ │ ├── model_evaluation.py
│ │ ├── model_service.py
│ │ └── model_promotion.py
│ └── ...
├── assets/
│ ├── model.png # Model architecture diagram
│ └── pipeline.png # Pipeline workflow visualization
├── config/
│ ├── configuration.py # Data schema definitions
│ └── ... # Environment variables/paths
├── tests/ # Unit/integration tests
├── requirements.txt # Project dependencies
├── bentofile.yaml # Bentoml yaml bento builder configuration
├── schema.yaml # Dataset schema definition
└── main.py # Pipeline execution script
- Python 3.10
- ZenML Cloud Account
- BentoML Cloud Account
- MLflow Server (e.g., hosted on DagsHub)
- Clone the repository:
git clone https://github.com/Ntchinda-Giscard/recomProject.git
- Create a virtual environment:
python -m venv .venv && source .venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Connect to ZenML Server:
zenml login "YOUR_SERVER_URL"
# Register components
zenml model-deployer register bentoml_deployer --flavor=bentoml
zenml model-registry register mlflow_registry --flavor=mlflow
zenml experiment-tracker register mlflow_tracker --flavor=mlflow
zenml data-validator register evidently_validator --flavor=evidently
# Create and activate stack
zenml stack register my_stack \
-d bentoml_deployer \
-r mlflow_registry \
-e mlflow_tracker \
-v evidently_validator \
-o default \
-a default
zenml stack set my_stack
-
Run the pipeline:
python main.py
Pipeline overview:
-
Deploy to BentoML Cloud:
bentoml cloud login \ --api-token 'YOUR_API_TOKEN' \ --endpoint 'YOUR_SERVER_URL' bentoml deploy --name recommend-system .
MIT License. See LICENSE
for details.
Need Help? Open an issue or contact @Ntchinda-Giscard.