8000 GitHub - huuthienp/bda1: Data Analytics Experimentation with NewChic Dataset
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

huuthienp/bda1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mini Task Sheet

  • Preliminaries: materials in weeks 1-4; Python
  • An e-commerce website wants to find
    • Top 10 products from selected categories
    • The best category
  • $n$ members = $n$ categories at least

Tasks

Data Preprocess

  • Records include 9 data sheets (.csv) and a data dictionary
  • Focus integer- and decimal-typed columns, except for id column
    • (May this help to define "best"?)
  • Output: task1.py
    • Initial draft provided, by courtesy of Robbie

Clustering

  • At least 2 algos

    • Algo 1: pca + k-means, by courtesy of Robbie

    • Algo 2: pending

  • Output: task2.py

Classification

  • At least 2 algos (again)

    • Pending
  • Output: task3.py

Report (5 Sections)

  1. Problem Analysis

    • Define top 10 products from selected categories

    • Define best category

    • Explain column choices for clustering, classification, and discussion

    • Draw a figure to illustrate the analytics plan

  2. Data Preprocess

    • Present steps in detail, e.g.

      • combining data sheets

      • removing columns, etc.

  3. Clustering

    • Explain algo choices

    • Present steps in detail (again)

      • Algo 1

      • Algo 2

    • Present results

  4. Classification

    • Same as Clustering
  5. Results Discussion

    • See questions in the task sheet
  6. Other

    • Title page: full names & student IDs

    • A single file report.pdf

    • No page limit

Assignment Submission

  • A single file A1.zip

  • Under 200 MB


End

About

Data Analytics Experimentation with NewChic Dataset

Resources

Stars

Watchers

Forks

Languages

0