Clustering Partition Hierachy
Clustering Partition Hierachy
Clustering Partition Hierachy
Partitioning approach:
Construct various partitions and then evaluate them by some
Hierarchical approach:
Create a hierarchical decomposition of the set of data (or objects)
Density-based approach:
Based on connectivity and density functions
Grid-based approach:
based on a multiple-level granularity structure
1
Partitioning Algorithms: Basic Concept
Partitioning method: Partitioning a database D of n objects into a set of
k clusters, such that the sum of squared distances is minimized (where
ci is the centroid or medoid of cluster Ci)
3
K Means Example Sec. 16.4
(K=2)
Pick seeds
Reassign clusters
Compute centroids
Reassign clusters
x x Compute centroids
x
x
Reassign clusters
Converged!
Example
D
Medicine Weight pH-Index
C
A 1 1
B 2 1
C 4 3 A B
D 5 4
5
Example
Step 1: Use initial seed points for partitioning
c1 A (1,1), c 2 B (2,1)
6
Example
Step 2: Compute new centroids of the
current partition
Knowing the members of each
cluster, now we compute the
new
centroid of each group based on
these new memberships.
c1 (1, 1)
2 4 5 1 3 4
c 2 ,
3 3
11 8
( , )
3 3
7
Example
Step 2: Renew membership based on
new centroids
Compute the distance of all
objects to the new centroids
8
Example
Step 3: Repeat the first two steps until
its convergence
Knowing the members of each
cluster, now we compute the new
centroid of each group based on
these new memberships.
1 2 11 1
c1 , (1 , 1)
2 2 2
45 34 1 1
c2 , ( 4 , 3 )
2 2 2 2
9
Example
10
Exercise
For the medicine data set, use K-means with the Manhattan distance
metric for clustering analysis by setting K=2 and initialising seeds as
C1 = A and C2 = C. Answer two questions as follows:
1. What are memberships of two clusters after convergence?
2. What are centroids of two clusters after convergence?
B 2 1
A B
C 4 3
D 5 4
11
Sec. 16.4
Termination conditions
Several possibilities, e.g.,
A fixed number of iterations.
Centroid positions don’t change.
below a threshold.
The decrease in RSS falls below a
threshold.
Comments on the K-Means Method
13
14
Sec. 16.4
Seed Choice
16
Example
17
Example
Cluster centers :{(7,4)}
18
Example
Cluster centers :{(7,4), (1,3)}
20
Comments on the K-Means Method
21
What Is the Problem of the K-Means Method?
10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
22
Comments on the K-Means Method
23
24
Hierarchical Clustering
Use distance matrix as clustering criteria. This method
does not require the number of clusters k as an input, but
needs a termination condition
Step 0 Step 1 Step 2 Step 3 Step 4
agglomerative
(AGNES)
a ab
b abcde
c
cde
d
de
e
divisive
Step 4 Step 3 Step 2 Step 1 Step 0 (DIANA)
25
AGNES (Agglomerative Nesting)
Introduced in Kaufmann and Rousseeuw (1990)
Implemented in statistical packages, e.g., Splus
Use the single-link method and the dissimilarity matrix
Merge nodes that have the least dissimilarity
Go on in a non-descending fashion
Eventually all nodes belong to the same cluster
10 10 10
9 9 9
8 8 8
7 7 7
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
26
Example
Five objects : a, b, c, d, e a b c d e
Distance matrix
a b c d e
a 0
b 12 0
c 4 10 0
a c b d e
d 15 6 12 0
e 7 8 5 13 0
27
Example
Distance matrix
a b c d e
a 0
b 12 0 a c b d e
c 4 10 0
d 15 6 12 0
e 7 8 5 13 0
ace b d
ace 0
b 8 0
a c e b d
d 12 6 0
29
Exercise
Five objects : a, b, c, d, e a b c d e
Distance matrix
a b c d e
a 0
b 12 0
c 4 10 0
d 15 6 12 0
e 7 8 5 13 0
Complete link
30
dendrogram
31
DIANA (Divisive Analysis)
10 10 10
9 9 9
8 8 8
7 7 7
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
32
Hierarchical Clustering (Cont.)
Most hierarchical clustering algorithms are variants of the
single-link, complete-link or average link.
Single link
works but not
complete link
Complete link
works but not
single link
Single Link vs. Complete Link (Cont.)
1 1
1 1
1 1
1 2 2 1 2 2
2 2 2
2
1 1 1
1
1 1
2 2
1 1
2 2
1 2 1 2 2 1 2 1 2 2
1 1
2 2
2 2
1-cluster noise 2-cluster
41
The CF Tree Structure
Root
CF1 CF2 CF3 CF6
child1 child2 child3 child6
Non-leaf node
CF1 CF2 CF3 CF5
child1 child2 child3 child5
42
Clustering Feature Vector in BIRCH
N 2 10
(3,4)
Xi
9
(2,6)
8
i 1
7
4
(4,5)
(4,7)
3
(3,8)
1
0
0 1 2 3 4 5 6 7 8 9 10
43
CF-Tree in BIRCH
Clustering feature:
Summary of the statistics for a given subcluster: the
44
CF-Tree in BIRCH
“children”
The nonleaf nodes store sums of the CFs of their
children
A CF tree has two parameters
Branching factor: max # of children
45
5. BIRCH algorithm
17 / 32
5. BIRCH algorithm
T
A
18 / 32
5. BIRCH algorithm
T
A
19 / 32
5. BIRCH algorithm
A B
20 / 32
5. BIRCH algorithm
LN3
LN2
Root
sc5 sc7
sc4 sc6
LN1 sc3
21 / 32
5. BIRCH algorithm
LN3
LN2 Root
sc5 sc7
sc4 sc6
LN1’ LN1’’
22 / 32
5. BIRCH algorithm
NLN1 NLN2
sc5
sc4 sc7
LN2 sc6
NLN2
LN1’’ LN1’ LN1’’ LN2 LN3
NLN1
sc3
sc1 sc2
sc8
LN1’
sc8 sc1 sc2 sc3 sc4 sc5 sc6 sc7
23 / 32
5. BIRCH algorithm
24 / 32
The Birch Algorithm
Cluster Diameter 1 2
(x x )
n( n 1) i j
parents
Algorithm is O(n)
Concerns
Sensitive to insertion order of data points
natural
Clusters tend to be spherical given the radius and diameter
measures
54
Property
N N
i j
(t t
j 1 i 1
) 2
Dm
N ( N 1)
N N
i i j j )
(t 2
j 1 i 1
2t t t 2
N ( N 1)
N N N N
i j i j )
( t
j 1
2
2t
i 1
t t 2
i 1 i 1
N ( N 1)
Property
(M
j 1
2 2t j M 1 Nt j )2
N ( N 1)
N N N
M
j 1
2 2M1 t j N t j
j 1 j 1
2
N ( N 1)
2
NM 2 2 M 1 NM 2
N ( N 1)
N = na +nb
M1 = Ma1 + Mb1
M2 = Ma2 + Mb2
Cluster Ca Cluster Cb
58