Hierarchical Clustering

It is a set of nested clusters organized as a hierarchical tree. No decision about number of clusters It is not used when data is big due to higher processing time.

Types

Agglomerative : Start from n clusters and get to one cluster Bottom up approach
Divisive : Start from one cluster and get to n clusters Top down approach

Advantages

Produces an additional ability to visualize
Potent, especially if the data contains real hierarchical relationships (eg evolution)

Disadvantages

Computationally intensive
Sensitive to noise and outliers

Distance Between Clusters (Agglomerative Clustering)

Single link : Shortest distance between an element in one cluster and an element in another cluster.
Complete link : Largest distance between elements in two clusters. Produces compact clusters.
Average link : Average distance of elements between 2 clusters.
Centroid : Distance between the centroids of 2 clusters
Metroid : Distance between centrally located object in both clusters.
Ward’s method : Minimize variance between 2 clusters.

Visualization

The results of hierarchical clustering can be shown using dendrogram.

At the bottom, we start with n data points (observations), each assigned to separate clusters.
Two closest clusters are then merged till we have just one cluster at the top
The height in the dendrogram at which two clusters are merged represents the distance between two clusters in the data space.

The best choice of the no. of clusters is the no. of vertical lines in the dendrogram cut by a horizontal line that can transverse the maximum distance vertically without intersecting a cluster.

Hierarchical Clustering

Hierarchical Clustering

Hierarchical Clustering

Types

Advantages

Disadvantages

Distance Between Clusters (Agglomerative Clustering)

Visualization

What's on this Page