**Introduction**

Before going onto the main topic lets us briefly discuss clustering and K-means clustering.

**Clustering**

Clustering is an unsupervised learning approach since there is no ground truth to compare the clustering algorithm's output to the true labels to evaluate its success. We just want to look at the data's structure by dividing the data points into various subgroups. In other words, clustering is a task of identifying subgroups in the data such that data points in the same subgroup show some similarities.

**K-means clustering**

The Kmeans method is an iterative technique that attempts to split a dataset into K separate non-overlapping subgroups (clusters), where each data point can belong to only one group. It attempts to make intra-cluster data points as similar as feasible while maintaining clusters as distinct (far) as possible. It distributes data points to clusters in such a way that the sum of the squared distances between them and the cluster's centroid (arithmetic mean of all the data points in that cluster) is as little as possible. The less the variance in the clusters, the more homogenous (similar) the data points are.

**Mini Batch KMeans**

The primary idea behind the Mini Batch K-means method is to create small random batches of data with a predetermined size so that they may be kept in memory. Each iteration obtains a fresh random sample from the dataset and uses it to update the clusters, and the process is continued until convergence. Each mini-batch updates the clusters using a convex mixture of the prototype values and the data, with a decreasing learning rate as the number of iterations increases. The inverse of the number of data assigned to a cluster during the procedure is this learning rate. Because the influence of incoming data diminishes as the number of iterations grows, convergence can be observed when no changes in the clusters occur for multiple iterations in a row.

**Properties of KMeans**

- No need to store the whole dataset in the memory
- At each iteration, the distance between the mini-batch and the
*k*centroids needs to be calculated. - At each iteration, the user needs to store k centroids and a subset of data in the memory.

Let us implement a small model to check the time difference between K-means and mini-batch K-means.