8000 GitHub - tanthml/spark_bazel: Spark Application with Bazel
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

tanthml/spark_bazel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spark_bazel

Some of my friends ask me recently how to code a pyspark or spark applications

So I create this tutorial for everyone who want to run spark application easly, these techniques I learned from my leader and I should share to everyone :D.

Setp-up

    ...
    # added by Miniconda2 installer
    export PATH="/home/<user-name>/miniconda2/bin:$PATH"
    export SPARK_HOME="/home/<user-name>/spark/spark-2.3.0-bin-hadoop2.7"

  • Install python packages
    pip install pytest click pyspark

How to run

Run test

bazel test core/pythontests/sparkel/core:test_nlp_words

Run build to check before run on spark

bazel build core/python/sparkel/spark_apps:package

Run program, be careful, since the output directory will be overwritten

bazel run core/python/sparkel/spark_apps:package -- core/python/sparkel/spark_apps/demo_spark_app.py --input_path /tmp/text.csv --output_dir /tmp/num_words --text_col content

How to code

Look at the file core/python/sparkel/spark_apps/demo_spark_app.py It imports function from core/python/sparkel/nlp It has a small issue/bug, let figure out by yourself ;)

So you can play around with them, if have any issues, just report here or invite me coffee, tea or tea-milk :D if you live in HCM city.

If I have time, will write an article about this bazel build structure later.

About

Spark Application with Bazel

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0