- Download input files:
coding_challenge_inventory.csv
,coding_challenge_meta.csv
andcoding_challenge_prices.csv
intoetl/inputs/
. - Run
docker compose up -d
. - Navigate to http://127.0.0.1:8000/docs in order to see the API docs and execute requests.
ETL process is implemented using Jupyter Notebook. While it's not the usual way how the production-grade
software is implemented, it's the perfect tool for R&D activities on data. Notebook is executed on the
input data stored in etc/inputs/
in headless mode using papermill
. Resulting notebook is converted
to html and saved a etl/outputs/output.html
together with the results (as CSVs) in the same directory.
Generated notebook describes the processing approach and assumptions for inputs.
Loading into DB (Postgres 17) is conducted via etl/db-init.sql
script that creates the tables, and
then loads generated data inside.
In order to browse loaded data one can use psql
shell executed directly on the container:
$ docker compose exec db psql -U postgres
There is only one test, called 01_end_to_end
. It contains sample inputs with all products types (regular, one with variant, one with case, one with both alternates) and data issues found in input files (UPC duplicates and malformed supplier) in order to test the logic behaviour and avoid regression. The test executes notebook on inputs and compares outputs with expected ones (manually curated). In order to run test:
- Create v 5FC5 irtualenv and activate it.
- Install runtime and test dependencies:
pip install -r requirements.txt -r requirements-dev.txt
- Run tests:
pytest tests/
API implementation is based on FastAPI with SQLAlchemy. Code resides in api/
directory inside
products_api
module. There are only 3 files:
main.py
-> REST API implementation, REST models and documentationmodels.py
-> database models, autogenerated from Postgres using sqlacodegenrepository.py
-> products repository with business logic of products fetching
- Create virtualenv and activate it.
- Install runtime and test dependencies:
pip install -r requirements.txt -r requirements-dev.txt
- Run tests:
pytest tests/
All the tests are executed on push to GitHub via Github Actions.