8000 GitHub - rtadeoz/Docker-Spark-Big-Data: Exercises in Spark with Docker and Data Languages
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

rtadeoz/Docker-Spark-Big-Data

 
 

Repository files navigation

Spark Projects with Docker

Project build using: https://github.com/big-data-europe/docker-spark

Supported versions:

  • Spark 3.0.0 for Hadoop 3.2 with OpenJDK 8 and Scala 2.12
  • Spark 2.4.5 for Hadoop 2.7+ with OpenJDK 8

How to start

docker-compose up

Master: http://localhost:8080

Workers:

http://localhost:8081

http://localhost:8082

Execute container with worker 1

docker exec -it spark-worker-1 bash

Python examples

Run pyspark CLI:

# Run pyspark CLI
./spark/bin/pyspark

# Execute a file
cd home/python/example
./../../../spark/bin/spark-submit example.py data.csv

Spark monitor:

http://localhost:4040

http://localhost:4041

apk add gcc pip3 install notebook

About

Exercises in Spark with Docker and Data Languages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 38.4%
  • Dockerfile 26.2%
  • Shell 21.2%
  • Python 13.1%
  • Scala 1.1%
0