K-Means: Step-By-Step Example
K-Means: Step-By-Step Example
K-Means: Step-By-Step Example
As a simple illustration of a k-means algorithm, consider the following data set consisting of the scores of two variables on each of
seven individuals:
Subject
1
2
3
4
5
6
7
A
1.0
1.5
3.0
5.0
3.5
4.5
3.5
B
1.0
2.0
4.0
7.0
5.0
5.0
4.5
This data set is to be grouped into two clusters. As a first step in finding a sensible initial partition, let the A & B values of the two
individuals furthest apart (using the Euclidean distance measure), define the initial cluster means, giving:
Individual
Group 1
Group 2
1
4
Mean Vector
(centroid)
(1.0, 1.0)
(5.0, 7.0)
The remaining individuals are now examined in sequence and allocated to the cluster to which they are closest, in terms of Euclidean
distance to the cluster mean. The mean vector is recalculated each time a new member is added. This leads to the following series of
steps:
Cluster 1
Step
Individual
1
2
3
4
5
6
1
1, 2
1, 2, 3
1, 2, 3
1, 2, 3
1, 2, 3
Mean
Vector
(centroid)
(1.0, 1.0)
(1.2, 1.5)
(1.8, 2.3)
(1.8, 2.3)
(1.8, 2.3)
(1.8, 2.3)
Cluster 2
Individual
4
4
4
4, 5
4, 5, 6
4, 5, 6, 7
Mean
Vector
(centroid)
(5.0, 7.0)
(5.0, 7.0)
(5.0, 7.0)
(4.2, 6.0)
(4.3, 5.7)
(4.1, 5.4)
Now the initial partition has changed, and the two clusters at this stage having the following characteristics:
Individual
Cluster 1
Cluster 2
1, 2, 3
4, 5, 6, 7
Mean Vector
(centroid)
(1.8, 2.3)
(4.1, 5.4)
But we cannot yet be sure that each individual has been assigned to the right cluster. So, we compare each individuals distance to its
own cluster mean and to
that of the opposite cluster. And we find:
Individual
1
2
3
4
Distance to Distance to
mean
mean
(centroid) of (centroid) of
Cluster 1
Cluster 2
1.5
5.4
0.4
4.3
2.1
1.8
5.7
1.8
5
6
7
3.2
3.8
2.8
0.7
0.6
1.1
Only individual 3 is nearer to the mean of the opposite cluster (Cluster 2) than its own (Cluster 1). In other words, each individual's
distance to its own cluster mean should be smaller that the distance to the other cluster's mean (which is not the case with individual 3).
Thus, individual 3 is relocated to Cluster 2 resulting in the new partition:
Mean Vector
(centroid)
1, 2
(1.3, 1.5)
3, 4, 5, 6, 7 (3.9, 5.1)
Individual
Cluster 1
Cluster 2
The iterative relocation would now continue from this new partition until no more relocations occur. However, in this example each
individual is now nearer its own cluster mean than that of the other cluster and the iteration stops, choosing the latest partitioning as the
final cluster solution.
Also, it is possible that the k-means algorithm won't find a final solution. In this case it would be a good idea to consider stopping the
algorithm after a pre-chosen maximum of iterations.