A workflow to process real-time post data from reddit.
- Create a reddit app by following this tutorial
- Create a
.env
file with the format belowCLIENT_ID=client_id CLIENT_SECRET=secret USER_AGENT=user_agent
- Set up a SMTP server by following this guide
- Install Docker
- Run
docker-compose up
Airflow ensures that these tasks are performed at the right time, in the right order, and with the right handling of unexpected issues.
- scrape_posts: get the newest 100 posts from r/dataengineering and save as CSV
- make_cloud: create a word cloud using the CSV data
- send_email: send an email with the word cloud attached