Hi, I'm Sam!
I am a Senior Operations Analyst, based in Berlin, with a background in the travel industry and education. I hold a Bachelor's degree (B.A.) in Modern Languages (Spanish & English) and a Postgraduate Certificate in Education (PGCE), specialising in Secondary Education (Spanish & French). During my 8 years in the travel industry, with GetYourGuide, I have worked as a Customer Service Agent, a Team Lead and a Quality Assurance Manager before transitioning over to an Operations Analyst in November 2021.
This repository is to showcase skills, share projects and track my journey in Data Analytics.
- About
- Portfolio projects
- [Mar 2025] Project: Marketing A/B Test
- [Aug 2024] Data Engineer Project: Dead by Daylight ETL Pipeline
- [Oct 2023] 9-Part Data Analysis Tutorial for Beginner Analysts
- [May 2021] Company Sales and Operations Analysis
- Learning projects
- Side projects
- Courses
- [Aug 2024] Practical database design
- [Mar 2023] Automate the Boring Stuff with Python Programming
- [Nov 2022] Python for Time Series Data Analysis
- [Oct 2022] Statistics for Data Science and Business Analysis
- [Jun 2022] The Complete Pandas Bootcamp 2022: Data Science with Python
- [Apr 2022] SQL Fundamentals Track
- [Jan 2022] The Complete SQL Bootcamp 2022: Go from Zero to Hero
- [Jun 2018] Management and Leadership: Growing as a Manager
- Reference material
Below are projects I've worked on.
Date:
Q1 2025
Repository:
[Link]
Notebook:
[Link]
PDF Slides:
[Link]
Google Slides:
[Link]
Description:
This project analyzes the effectiveness of a marketing campaign using A/B testing.
The goal is to determine whether ads significantly impact customer conversion rates. The analysis includes statistical methods such as attribution percentage and odds ratio calculations.
The data is sourced from Kaggle
Results:
A marketing analysis of the provided dataset summarised in PDF form
Skills:
Data analysis | ETL | Data Pipeline | Data cleaning | Descriptive statistics | Statistical analysis | Data visualization | A/B Testing
Date:
Q3 2024
Repository:
[Link]
Notebook:
[Link]
PDF Slides:
[Link]
Project Documentation:
[Link]
Description:
This project focuses on building an ETL pipeline for Dead by Daylight game data.
The pipeline processes data from various game entities such as characters, perks, maps, addons, and match detail and sorts this data into relvant tables in a databse.
Then, using python, we designed a pipeline to transform, clean, and load data into structured formats that support game balancing analysis, player performance tracking, and game element ratings.
The data is sourced from Dennis Reep's Dead by Daylight website via scraping (with permission), and integrated into a database designed to store, manage, and analyze key elements for decision-making around the game.
Results:
A comprehensive analysis on the Dead by Daylight game summarised in PDF form
Skills:
Data engineering | ETL | Data Pipeline | Data Scraping | Data cleaning | Data analysis | Descriptive statistics | Statistical analysis | Data visualization
Date:
Q3 2023
Repository:
[Link]
Notebook:
[Link]
PDF Slides:
[Link]
Blog Posts:
[Link]
Description:
A comprehensive 9-part guide on analysing a dataset using Python (Pandas) and VS Code. Perfect for beginner analysts looking to enhance their data analysis portfolio. Written on Medium.com as a series of blog posts.
π The series covers:
- Defining Objectives
- Data Acquisition
- Data Exploration
- Data Cleaning
- Data Visualization
- Feature Engineering
- Statistical Analysis
- Machine Learning
- Presenting Solutions
Each step is broken down with practical examples and code snippets, making it easy for beginners to follow along and learn.
Results:
A 9-part blog series to help aspiring data analysts prepare a portfolio piece.
Skills:
Data cleaning | Data analysis | Descriptive statistics | Statistical analysis | Machine learning | Data visualization
Date:
Q2 2021
Repository:
[Link]
Notebook:
[Link]
PDF Slides:
[Link]
Description:
The dataset contains ~100k records of a company's sales, customer, operational and product data. The project involved: data loading, data cleaning, preprocessing, filling missing values, exploratory data analysis, measuring statistical factors, hypothesis testing.
Results:
Data-based business recommendations for the company.
Skills:
Data cleaning | Data analysis | Descriptive statistics | Data visualization
Below are projects worked on for online courses.
Date:
Q2 2022
Duration:
35 hours
Repository:
[Link]
Notebooks:
[Data aggregation] | [Data analysis]
Description:
A course aimed at learning to use Pandas (Python library) for data aggregation and data analysis.
There were two capstone projects, one for each core skill (data aggregation & exploratory data analysis).
Skills:
Data cleaning | Data analysis | Descriptive statistics | Data visualization | Statistics | Machine learning | Time series
Below are a collection of non-data-analytic projects that I have been working on.
Date:
Q2 2022
Repository:
[Link]
Description:
A Google sheets formula to speed up a recurring process. The formula allows users to input more than one email address in the
receiver
field of a Google Form, splits out the data and duplicates the rows in the back end (1 duplication per email entered into thereceiver
field). This repository documents this project.
Skills:
Spreadsheet formulas | Google suite
Date:
Q2 2022
Repository:
[Link]
Description:
A Google Apps Script to speed up a repetitive, manual process and reduce input errors. The script creates two dropdown menus in a Google Sheet, where the second dropdown menu is dependant on the input of the first dropdown menu. This repository documents this project.
Skills:
Google Apps Script | Google suite
Although not a replacement for on-the-job experience or project work, here are some of the courses I have completed over the years.
Organisation:
Udemy
Duration:
1 month
Credential:
[Link]
Repository:
[Link]
Description:
- Build a database design from a given set of requirements
- Determine a set of prelimiary entities and attributes to start a database design
- Normalise a database design into 1NF taking into consideration multivalued and miltipart fields
- Establish table candidate and primary keys
- Normalise a database design into 2NF taking into consideration partial key dependencies
- Identify multiple types of table relationships and define relationships between tables
- Normalise a database design into 3NF taking into consideration transitive dependencies
- Develop database design solutions to common features of a blog application
Organisation:
Udemy
Duration:
1 month
Credential:
[Link]
Repository:
[Link]
Description:
- Automate tasks on their computer by writing simple Python programs.
- Write programs that can do text pattern recognition with "regular expressions".
- Programmatically generate and update Excel spreadsheets.
- Parse PDFs and Word documents.
- Crawl web sites and pull information from online sources.
- Write programs that send out email notifications.
- Use Python's debugging tools to quickly figure out bugs in your code.
- Programmatically control the mouse and keyboard to click and type for you.
Organisation:
Udemy
Duration:
1 month
Credential:
[On going]
Repository:
[Link]
Description:
- Pandas for Data Manipulation
- NumPy and Python for Numerical Processing
- Pandas for Data Visualization
- How to Work with Time Series Data with Pandas
- Use Statsmodels to Analyze Time Series Data
- Evaluate a model's efficiency by comparing training and test data
- Use Facebook's Prophet Library for forecasting
- Understand advanced ARIMA models for Forecasting
Organisation:
Udemy
Duration:
3 months
Credential:
[Link]
Description:
- Understand the fundamentals of statistics
- Learn how to work with different types of data
- How to plot different types of data
- Calculate the measures of central tendency, asymmetry, and variability
- Calculate correlation and covariance
- Distinguish and work with different types of distributions
- Estimate confidence intervals
- Perform hypothesis testing
- Make data driven decisions
- Understand the mechanics of regression analysis
- Carry out regression analysis
- Use and understand dummy variables
- Understand the concepts needed for data science even with Python and R
Organisation:
Udemy
Duration:
6 months
Credential:
[Link]
Description:
- Bring your data handling & data analysis skills to an outstanding level.
- Master a complete machine learning project A-Z with Pandas, Scikit-Learn, and Seaborn
- Practice and master your Pandas skills with quizzes, 150+ exercises, and comprehensive projects
- Learn and master the most important Pandas workflows for finance
- Learn the basics of Pandas and Numpy coding
- Learn and practice all relevant Pandas methods and workflows with real-world datasets
- Import, clean, and merge messy data and prepare data for machine learning
- Analyze, visualize, and understand your data with Pandas, Matplotlib, and Seaborn
- Import financial/stock data from web sources and analyze them with Pandas
- Learn and master important statistical concepts with scipy
Organisation:
DataCamp
Duration:
21 hours
Credential:
[Link]
Description:
- Introduction to SQL
- Joining data in SQL
- Intermediate SQL
- PostgresSQL summary stats and window functions
- Functions for Manipulating Data in PostgreSQL
Organisation:
Udemy
Duration:
9 hours
Credential:
[Link]
Description:
- SQL statement fundamentals (Select, Count, Where, Order by, Limit, In, (I)like)
- Group by statements (Group by, Having)
- Joins (As statement, Inner joins, Full outer joins, Left outer joins, Right joins, Union)
- Advanced SQL commands (Timestamps, extract, mathematical functions, string functions, subquery, self-join)
- Creating databases and tables (data types, primary & foreign keys, constraints, create table, insert, update, delete, alter table, drop table, check constraint)
- Conditional expressions and procedures (case, coalesce, cast, nullif, views, import, export)
Organisation:
The Open University Business School
Duration:
4 weeks
Credential:
[Link]
Description:
The course offers participants an introduction to the foundation skills and knowledge of a middle manager and leader. The learning activities begin the process of preparing the learner for the Chartered Management Institute (CMI) qualifications in Management and Leadership at Level 5. It introduces them as experienced practitioners to the underpinning theory of management and leadership. The course was prepared by The Open University Business School (AMBA, EQUIS, AACSB triple-accredited)
A list of useful reference material.
Β© 2022 GitHub, Inc.
Terms
Privacy