Add KMeansPolis implementation that more closely maps to Polis algorithm #8

patcon · 2025-02-25T20:42:54Z

This is low priority, and will be left as a draft for discussion, for data science folks who know more than me to chime in on.

KMeans runs a certain number of iterations up to a max, but stops when certain conditions are met between successive iterations.

In sklearn's KMeans implementation, the inertia is used to determine when to stop.
In Polis' KMeans implementation, the cluster center movement falling within a tolerance determines when to stop.

This might lead to slightly different results, which might matter later. Leaving this here for posterity.

Might investigate this further when we have unit tests over run_kmeans() that test various sizes of conversations.

This PR was written with the help of ChatGPT:
See: https://chatgpt.com/c/67be1cc5-b00c-800b-95ba-a0267edfb836

Clojure's [Polis] threshold is per-cluster-center movement, while sklearn's tol is based on total inertia change.

If clusters shift slightly but inertia barely changes, sklearn might stop earlier than [Polis'] same-clustering?.

If inertia fluctuates while centers stay put, sklearn might run longer than [Polis'] same-clustering?.

If cluster centers move but inertia remains unchanged, it means the reassignment of points to clusters does not significantly affect the total squared distances.

Possible Scenarios Where This Happens:

Centers Shift Without Changing Assignments

If all data points remain assigned to the same clusters despite center movement, inertia stays the same.

Example: The cluster centers jitter slightly but the sum of squared distances doesn’t change.

Symmetric Reassignment of Points

Suppose some points switch clusters, but the overall distribution remains similar.

Example: Two clusters swap a few points, but the distance to the centers remains the same.

Flat Regions in the Data

If the dataset has a uniform spread of points, minor shifts in cluster centers might not impact the overall distance sum.

… Polis platform. See: https://chatgpt.com/c/67be1cc5-b00c-800b-95ba-a0267edfb836

Added custom KMeansPolis that reproduces tolerance check of kmeans in…

cf8c37b

… Polis platform. See: https://chatgpt.com/c/67be1cc5-b00c-800b-95ba-a0267edfb836

patcon added this to Red-Dwarf Roadmap Mar 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add KMeansPolis implementation that more closely maps to Polis algorithm #8

Add KMeansPolis implementation that more closely maps to Polis algorithm #8

Uh oh!

Uh oh!

Uh oh!

Add KMeansPolis implementation that more closely maps to Polis algorithm #8

Are you sure you want to change the base?

Add KMeansPolis implementation that more closely maps to Polis algorithm #8

Uh oh!

Conversation

Uh oh!

Possible Scenarios Where This Happens:

Uh oh!

Uh oh!