IMPORTANT

REPLACE passengers.csv and flightData.csv in the project root with the zipped files before going any further!

Quantexa Scala Malaga

In this repo you will find the Dev Case code solution in Scala and the utilities to generate the solution for each question.

Requirements

In order to compile, test, package, and run in a local spark cluster the following are required:

SBT>=1.9<2.0
Scala=2.12<=2.13
JRE=8
Docker
IMPORTANT: the data files. This Git repo only has lightweight sample files with header + ten rows. Replace them with the real files (included in the ZIP).

How to

Compile, test, and package

Use sbt! This is required before running it!!

$ sbt
>>> compile
>>> test
>>> package

Run

Once we have compiled and packaged the app (sbt package), we can run it. Submit the job to a Spark cluster. To simulate this locally we use Docker.

You can either:

Pull Apache's Sponsored spark image here
Pull Docker's official spark image here.
Or you can build your own using the Dockerfile provided in this repo as a template. Parametrize it for your desired Spark version see the archive. Use only Spark WITH hadoop!. DISCLAIMER: Potential errors issues with lower versions of Spark and Scala (i.e. Spark 2.4.8 and Scala 2.12.10).

Use the script to get a feel of how it works You can run jobs with pre-defined parameters using run.sh <JOB_NAME. Output is hardcoded to an output folder (will be created if it does not exist).

To easily run all jobs I've also created a simple Makefile: You can run all jobs sequentially with make run-all

But ideally you should submit the job to a spark cluster and provide the args there.

The available job names are: - TotalFlightsPerMonth - MostFrequentFliers - LongestRunOutsideUK - TotalSharedFlights - TotalSharedFlightsInRange

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
project		project
src		src
.gitignore		.gitignore
.sdkmanrc		.sdkmanrc
Dockerfile.template		Dockerfile.template
Makefile		Makefile
README.md		README.md
build.sbt		build.sbt
flightData.csv		flightData.csv
passengers.csv		passengers.csv
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IMPORTANT

Quantexa Scala Malaga

Requirements

How to

Compile, test, and package

Run

About

Uh oh!

Releases

Packages

Languages

eltbus/malaga-spark

Folders and files

Latest commit

History

Repository files navigation

IMPORTANT

Quantexa Scala Malaga

Requirements

How to

Compile, test, and package

Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages