- Preliminaries: materials in weeks 1-4; Python
- An e-commerce website wants to find
- Top 10 products from selected categories
- The best category
-
$n$ members =$n$ categories at least
- Records include 9 data sheets (
.csv
) and a data dictionary - Focus
integer
- anddecimal
-typed columns, except forid
column- (May this help to define "best"?)
- Output:
task1.py
- Initial draft provided, by courtesy of Robbie
-
At least 2 algos
-
Algo 1: pca + k-means, by courtesy of Robbie
-
Algo 2: pending
-
-
Output:
task2.py
-
At least 2 algos (again)
- Pending
-
Output:
task3.py
-
Problem Analysis
-
Define top 10 products from selected categories
-
Define best category
-
Explain column choices for clustering, classification, and discussion
-
Draw a figure to illustrate the analytics plan
-
-
Data Preprocess
-
Present steps in detail, e.g.
-
combining data sheets
-
removing columns, etc.
-
-
-
Clustering
-
Explain algo choices
-
Present steps in detail (again)
-
Algo 1
-
Algo 2
-
-
Present results
-
-
Classification
- Same as Clustering
-
Results Discussion
- See questions in the task sheet
-
Other
-
Title page: full names & student IDs
-
A single file
report.pdf
-
No page limit
-
-
A single file
A1.zip
-
Under 200 MB
End