This repository contains a dimensional data warehouse design for Olist, along with its Slowly Changing Dimension (SCD) strategy. It also includes a modular ELT pipeline using Python and Luigi for orchestration.
Olist is a major e-commerce marketplace in Brazil, connecting small businesses with customers across the country. As the business grows, so does the complexity of handling its transactional data. This repository showcases:
- A dimensional model designed to support Olist's analytics needs
- A clear SCD strategy to handle changes in dimension tables
- A working ELT pipeline built with Python and Luigi
As part of the design process, we conducted a requirements gathering session (simulated) with stakeholders to understand how they want to handle changes in dimension attributes.
Below are key questions and responses:
Answer: "No, we only need current product details. Overwrite old values if changes occur."
Answer: "Seller locations rarely change. Just update current values."
Answer: "No, only the latest location matters for logistics."
Answer: "Track status changes in fact tables only."
Based on stakeholder feedback, we chose a Type 1 SCD strategy — overwrite old values without keeping historical versions.
Dimension | SCD Type | Change Handling Description |
---|---|---|
dim_product |
Type 1 | Overwrite category and attributes |
dim_seller |
Type 1 | Overwrite location information |
dim_customer |
Type 1 | Overwrite address fields |
dim_order_status |
Type 1 | Overwrite status description |
- Aligned with business needs – Stakeholders do not require historical tracking
- Simpler maintenance – No versioning or date range logic
- More efficient – Reduces storage use and ETL complexity
- Straightforward updates – Clean
UPDATE
statements instead of inserts
This project uses Luigi to orchestrate a modular ETL process implemented in Python and SQL.
Script | Purpose |
---|---|
extract.py |
Connects to source DB and extracts raw data |
load.py |
Loads raw data into staging/DWH tables |
transform.py |
Applies transformations and loads final tables |
- Clone the repository
- Create a
.env
file with database connection info:SRC_POSTGRES_DB=olist_src SRC_POSTGRES_HOST=localhost SRC_POSTGRES_USER=your_username SRC_POSTGRES_PASSWORD=your_password SRC_POSTGRES_PORT=5433 DWH_POSTGRES_DB=olist_dwh DWH_POSTGRES_HOST=localhost DWH_POSTGRES_USER=your_username DWH_POSTGRES_PASSWORD=your_password DWH_POSTGRES_PORT=5434
- Start PostgreSQL services:
docker-compose up -d
- Set up your Python environment:
python -m venv venv source venv/bin/activate # On Windows use venv\Scripts\activate pip install -r requirements.txt
- Run Luigi tasks:
luigi --module elt_main --local-scheduler
This setup is for learning and prototyping. For production use, additional error handling and monitoring would be needed.