This project implements a data processing pipeline for an e-commerce analytics platform, featuring ETL processes, a GraphQL API, and workflow orchestration.
- ETL pipeline for processing large CSV datasets
- PostgreSQL database with optimized schema and partitioning
- FastAPI-based GraphQL API
- Flyte workflow orchestration
- Comprehensive data analytics capabilities
- Docker support for easy deployment
- Python 3.12.2
- PostgreSQL
- Docker and Docker Compose
- Clone the repository:
git clone https://github.com/Shrhawk/ecommerce-analytics.git
cd ecommerce-analytics
- Copy the environment file:
cp example.env .env
- Build and start the services:
docker-compose up --build
- Run database migrations:
docker-compose run web alembic upgrade head
- Generate sample data:
docker-compose run web python data-generator.py
- Run ETL pipeline:
docker-compose run web python app/etl/pipeline.py
- Run ETL workflow:
docker-compose run web python app/workflows/etl_workflow.py
Once you are done with above queries, you can access the API and Graphql playground.
The following services will be available:
- API Docs: http://localhost:8000/docs
- Graphql: http://localhost:8000/graphql
Copy Graphql queries and variables from graphql_queries.md
and run in playground.