8000 GitHub - elopezphy/Cognihack_DSX: Materials to support the Data Science track within the Cognihack event
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

elopezphy/Cognihack_DSX

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Icon
Cognihack Data Science Track
Predicting repeat shopping likelihood with Python, Spark and Watson Machine Learning
Icon

Our exercise will take you through the process of creating a predictive model in Python using the data manipulation and machine learning libraries distributed with Spark.

Once we've worked through the process of reading, understanding and preparing our data and have built a simple model together, we'll deploy it to the Watson Machine Learning service and make it available as a real-time scoring service.

You should spend the remaining time working as a group to speculate on how you might improve these predictions. The cognihack tutors will endeavour to assist with any experimentation to help you create and evaluate refinements to the baseline model.

Learning goals

The learning goals of this exercise are:

  • Loading CSV files into an Apache® Spark DataFrame.
  • Exploring the data using the features within:
    a) Spark's data wrangling Python API: pyspark.sql;
    b) the pandas data wrangling library; and
    c) matplotlib for exploratory plots.
  • Engineering some basic predictive features, again using pyspark.sql and Spark user defined functions (UDFs).
  • Preparing the data for training and evaluation.
  • Creating an Apache® Spark machine learning pipeline.
  • Training and evaluating a model.
  • Persisting a pipeline and model in Watson Machine Learning repository.
  • Deploying the model for online scoring using Wastson Machine Learning API.
  • Scoring sample scoring data using the Watson Machine Learning API.

Setup

Before we begin working through this notebook, you must perform the following setup tasks:

About

Materials to support the Data Science track within the Cognihack event

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%
0