K means (Non Hierarchical)

from sklearn.cluster import KMeans

It is based on division of objects into non overlapping subsets. Main objective is to form clusters that are homogeneous in nature and heterogeneous to each other.
❕ Only for continuous variables.

Advantages

Faster, more reliable, works with large data.
Computationally lighter than other methods

Disadvantages

Can only identify clusters circular / spherical in nature. (check crescent dataset)
Distance based

Process

Identify value of ‘k’
Assign random k observations as seeds
Assign each record to one of the k seeds based on proximity
Form clusters
Calculate centroids of clusters
Assign centroids as new seed
Form new clusters
Recalculate clusters
Continue process until stable clusters are formed (boundary ceases to change)

Elbow Criterion (Scree Plot):

K means clustering doesn’t provide an estimate of the number of clusters required. Hence elbow criterion is used to determine optimal number of clusters.

The method states that you should choose a number of clusters so that adding another cluster does not add any sufficient information. It is plotted by ratio of within cluster variance to between cluster variance against number of clusters. The objective is to minimize the within and maximize the between distances.

Validation:

Silhouette Index
Davies Bouldin Score
Calinski Harabasz Score
Pseudo F

Parameter Tuning

km = Kmeans(n_clusters=2, max_iter=100)
km.fit(X_std)

K-Means