Introduction To Kmeans

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Introduction to K-Means Clustering

We have finally arrived at the meat of this article!

Recall the first property of clusters – it states that the points within a cluster should be
similar to each other. So, our aim here is to minimize the distance between the
points within a cluster.

There is an algorithm that tries to minimize the distance of the points in


a cluster with their centroid – the k-means clustering technique.
K-means is a centroid-based algorithm, or a distance-based algorithm, where we
calculate the distances to assign a point to a cluster. In K-Means, each cluster is
associated with a centroid.

The main objective of the K-Means algorithm is to minimize


the sum of distances between the points and their respective
cluster centroid.
Let’s now take an example to understand how K-Means actually works:

We have these 8 points and we want to apply k-means to create clusters for these
points. Here’s how we can do it.

Step 1: Choose the number of clusters k


The first step in k-means is to pick the number of clusters, k.

Step 2: Select k random points from the data as centroids


Next, we randomly select the centroid for each cluster. Let’s say we want to have 2
clusters, so k is equal to 2 here. We then randomly select the centroid:
Here, the red and green circles represent the centroid for these clusters.

Step 3: Assign all the points to the closest cluster centroid


Once we have initialized the centroids, we assign each point to the closest cluster
centroid:

Here you can see that the points which are closer to the red point are assigned to the
red cluster whereas the points which are closer to the green point are assigned to the
green cluster.

Step 4: Recompute the centroids of newly formed clusters


Now, once we have assigned all of the points to either cluster, the next step is to
compute the centroids of newly formed clusters:
Here, the red and green crosses are the new centroids.

Step 5: Repeat steps 3 and 4


We then repeat steps 3 and 4:

The step of computing the centroid and assigning all the points to the cluster based on
their distance from the centroid is a single iteration. But wait – when should we stop this
process? It can’t run till eternity, right?

Stopping Criteria for K-Means Clustering


There are essentially three stopping criteria that can be adopted to stop the K-means
algorithm:

1. Centroids of newly formed clusters do not change


2. Points remain in the same cluster
3. Maximum number of iterations are reached

We can stop the algorithm if the centroids of newly formed clusters are not changing.
Even after multiple iterations, if we are getting the same centroids for all the clusters, we
can say that the algorithm is not learning any new pattern and it is a sign to stop the
training.

Another clear sign that we should stop the training process if the points remain in the
same cluster even after training the algorithm for multiple iterations.

Finally, we can stop the training if the maximum number of iterations is reached.
Suppose if we have set the number of iterations as 100. The process will repeat for 100
iterations before stopping.

You might also like