**K-means++**

Different initialization points lead to other clusters; this is the problem with the K-means. So we have to choose a method to initialize the centroids better. The concept of K-means++ is introduced, which is used for initialization. The algorithm says that;

**Randomly select the first Centroid.**
**For each data point, calculate the distance to the nearest Centroid.**
**Select the next Centroid. The probability of choosing the next Centroid is directly proportional to the distance.**
**Then Repeat steps 2 and 3 for K centroids.**

Here the distance is Euclidean or Manhattan distance.

In this way, K-means++ chooses K initial centroids. After selecting the centroids, we can run K-means algorithms on the datasets.

**Frequently Asked Questions**

**Q1) What is the Euclidean distance used in the K-means algorithm?**

=>Euclidean distance is the distance measured between pairs of points in n-dimensional space. If there are n coordinates, then the distance between two points p and q is given by-

Euclidean distance = i=1n(pi-qi)2

**Q2) What is the Manhattan distance used in the K-means algorithm?**

=> Manhattan distance between two points p and q in space is their pairwise absolute difference. If there are n coordinates, then the distance between two points p and q is given by-

i=1n|pi-qi|

**Q3) How do we calculate the centroid in the K-means algorithm?**

=> For n data points, i th coordinate of centroid is given by

xi = 1npi , where p is individual data points.

**Q4) What is the major issue in the implementation of K-means++?**

=> K-means++ is computationally expensive to implement as compared to K-means. The run-time for convergence to optimum is drastically reduced for K-means++.

**Q5) How does Outliers affect the K-means algorithm?**

=>Outlier will cause the Centroid to be displaced from its actual position, or the Outlier might get their clusters instead of being ignored. So we should remove Outliers before clustering.

**Key Takeaways**

We come to the end of the discussion. There are various exciting clustering algorithms in Unsupervised learning. You can learn them in great detail with hands-on projects from-__ Link__.

Donâ€™t forget to check: __Unsupervised Learning____ __and,** **__GCD Euclidean Algorithm__.

**THANK YOU**