8000 GitHub - jla524/wordflow: An automated data pipeline built with Airflow and Docker.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

jla524/wordflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wordflow

wordcloud A workflow to process real-time post data from reddit.

Quickstart Guide

  1. Create a reddit app by following this tutorial
  2. Create a .env file with the format below
    CLIENT_ID=client_id
    CLIENT_SECRET=secret
    USER_AGENT=user_agent
    
  3. Set up a SMTP server by following this guide
  4. Install Docker
  5. Run docker-compose up

Directed Acyclic Graph (DAG)

workflow Airflow ensures that these tasks are performed at the right time, in the right order, and with the right handling of unexpected issues.

  • scrape_posts: get the newest 100 posts from r/dataengineering and save as CSV
  • make_cloud: create a word cloud using the CSV data
  • send_email: send an email with the word cloud attached

About

An automated data pipeline built with Airflow and Docker.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0