- In this assignment, you shall implement clustering on the blogs dataset containing 99 blogs.
- You can use any programming language you like.
You can work alone or in a group of two students.- You shall present your application and code at an oral examination.
See the Deadlines and Submissions page.
Grade | Requirements |
---|---|
E |
|
C-D |
|
A-B |
|
K-means is not deterministic (the results differ between runs), but you usually find related blogs such as the Google and search engine blogs in one cluster as shown here:
Note that you can make some performance improvements to speed up the cluster generation. In comparison, my implementation in Python takes around 3 seconds to generate the clusters and build a JSON response.
Hierarchical clustering always gives the same result. The tree is too large to show here, but if you get the branch shown below, it most likely works correctly:
Note that there are many performance improvements you can make to speed up tree generation. As a comparison, my implementation in Python takes around 10 seconds to generate the tree and build a JSON response.