8000 GitHub - eltbus/malaga-spark: dev case for Q
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

eltbus/malaga-spark

Repository files navigation

IMPORTANT

REPLACE passengers.csv and flightData.csv in the project root with the zipped files before going any further!

Quantexa Scala Malaga

In this repo you will find the Dev Case code solution in Scala and the utilities to generate the solution for each question.

Requirements

In order to compile, test, package, and run in a local spark cluster the following are required:

  • SBT>=1.9<2.0
  • Scala=2.12<=2.13
  • JRE=8
  • Docker
  • IMPORTANT: the data files. This Git repo only has lightweight sample files with header + ten rows. Replace them with the real files (included in the ZIP).

How to

Compile, test, and package

Use sbt! This is required before running it!!

$ sbt
>>> compile
>>> test
>>> package

Run

Once we have compiled and packaged the app (sbt package), we can run it. Submit the job to a Spark cluster. To simulate this locally we use Docker.

You can either:

  • Pull Apache's Sponsored spark image here
  • Pull Docker's official spark image here.
  • Or you can build your own using the Dockerfile provided in this repo as a template. Parametrize it for your desired Spark version see the archive. Use only Spark WITH hadoop!. DISCLAIMER: Potential errors issues with lower versions of Spark and Scala (i.e. Spark 2.4.8 and Scala 2.12.10).

Use the script to get a feel of how it works You can run jobs with pre-defined parameters using run.sh <JOB_NAME. Output is hardcoded to an output folder (will be created if it does not exist).

To easily run all jobs I've also created a simple Makefile: You can run all jobs sequentially with make run-all

But ideally you should submit the job to a spark cluster and provide the args there.

The available job names are: - TotalFlightsPerMonth - MostFrequentFliers - LongestRunOutsideUK - TotalSharedFlights - TotalSharedFlightsInRange

About

dev case for Q

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0