8000 GitHub - Postiii/twds-crawler: Highly scalable webcrawler for towardsdatascience.com by using Python, Selenium, Docker, Kubernetes and the infrastructure of the Google Cloud Platform
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Highly scalable webcrawler for towardsdatascience.com by using Python, Selenium, Docker, Kubernetes and the infrastructure of the Google Cloud Platform

Notifications You must be signed in to change notification settings

Postiii/twds-crawler

Repository files navigation

twds-crawler

This repository contains the code to build a highly scalable webcrawler for towardsdatascience.com by using Python, Selenium, Docker, Kubernetes and the infrastructure of the Google Cloud Platform. It was part of a datascience-class to get in touch with some of the most common technologies when it comes to big web- and big data processing.

Documentation

A more detailed description of the implementation can be found in my medium.com article.

Trouble Shooting

Additionally I documented some of my challenges in the trouble-shooting.md

About

Highly scalable webcrawler for towardsdatascience.com by using Python, Selenium, Docker, Kubernetes and the infrastructure of the Google Cloud Platform

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0