Clustering

Clustering Is a set of data driven partitioning techniques designed to group a collection of objects into clusters.

⚠️ Data should ALWAYS be continuous and standardized in nature.

✴ Clustering is finding borders between groups
✴ Segmentation is using borders to form groups

Applications :

Market Segmentation
Sales segmentation : what type of customer wants what
Credit risk
Operations : High performing persons and promotions
Insurance : identifying groups with high average claim cost
Data reduction : grouping observations to reduce number is obs

How to build clusters :

Select distance measure
Select clustering algorithm
Define the distance between 2 clusters
Determine no of clusters
Validate the analysis

Methods

Linkage method
Variance method
Centroid method

Closeness of two clusters :

The decision of merging two clusters is taken on the basis of closeness of these clusters. There are multiple metrics for deciding:

Euclidean distance: (a-b)2 = √(Σ(ai-bi))
Squared Euclidean distance: ((a-b)2)2 = Σ((ai-bi)2)
Manhattan distance: (a-b)1 = Σ(ai-bi)
Maximum distance: (a-b)INFINITY = maxi(ai-bi)
Mahalanobis distance: √((a-b)T S-1 (-b)) {where, s : covariance matrix}