OPTICS stands for Ordering Points To Identify Cluster Structure. The OPTICS algorithm draws inspiration from the DBSCAN clustering algorithm. The difference ‘is DBSCAN algorithm assumes the density of the clusters as constant, whereas the OPTICS algorithm allows a varying density of the clusters.

OPTICS adds two more terms to the concept of the DBSCAN algorithm, i.e.:

Core Distance

Reachability Distance

Core Distance

It is the minimum value of radius required to classify a given point as a core point. If the given point is not the core point, then its core distance is not defined.

Core Distance of an object P

The core Distance of an object P is the smallest value of ɛ such that ɛ' neighborhood of P has at least MinPts objects.

For example, consider the following figure. The point p is displayed in the figure. ɛ has a value of 6, and MinPts is 5. We have to find the core distance of point p. According to the definition, the minimum value of radius that satisfies the MinPts criteria is called core distance. Now, if we consider the core distance of 3mm, i.e., the radius of 3, the criteria of MinPts is fulfilled.

It's determined in relation to another data item, q(Let). The maximum of the Euclidean Distance(or any other distance metric) between two points p and q, and the Core Distance of p is the Reachability distance between p and q. If q is not a Core point, the Reachability Distance is not specified.

Calculating Reachability Distance Between Two Points

First, we will calculate the reachability distance between p and q. In the figure below, we can see that point q is outside the core distance of point p, so to calculate the reachability distance between p and q is the euclidean distance between p and q, i.e., 7mm.

The reachability distance between p and r is the core distance of point p, as r lies within the core distance. Hence the reachability distance between p and r will be 3mm.

The basic code for OPTICS is similar to the DBSCAN algorithm.

function OPTICS(DB, epsilon, MinimumnPts) is
for each point x of DB do
x.reachability_distance = UNDEFINED
for each unprocessed point x of DB do
N = getNeighbors(x, epsilon)
mark x as processed
output x to the ordered list
if core_distance(x, epsilon, MinimumnPts) != UNDEFINED then
Seeds = empty priority queue
update(N, x, Seeds, epsilon, MinimumnPts)
for each next q in Seeds do
N' = getNeighbors(q, epsilon)
mark q as processed
output q to the ordered list
if core_distance(q, epsilon, MinimumnPts) != UNDEFINED do
update(N', q, Seeds, epsilon, MinimumnPts)

Pseudo code for update function.

function update(N, p, Seeds, epsilon, MinimumPts) is
coredist = core_distance(p, epsilon, MinimumPts)
for each o in N
if o is not processed then
new_reach_dist = max(coredist, dist(p,o))
if o.reachability_distance == UNDEFINED then // o is not in Seeds
o.reachability_distance = new_reach_dist
Seeds.insert(o, new_reach_dist)
else // o in Seeds, check for improvement
if new_reach_dist < o.reachability_distance then
o.reachability_distance = new_reach_dist
Seeds.move-up(o, new_reach_dist)

OPTICS vs DBSCAN

Memory Consumption: The OPTICS clustering algorithm uses more memory because it uses a priority queue (Min Heap) to find the next data point that is closest in terms of Reachability Distance to the one currently being processed. It also needs greater processing capacity since DBSCAN's closest neighbor searches are more complex than radius queries.

Fewer variables: The epsilon parameter is not required for the OPTICS clustering approach, and it is just used in the pseudo-code above to save time. As a result, the analytical process of parameter adjustment is reduced.

This method does not divide the input data into clusters. It just generates a Reachability distance plot, which the programmer must read in order to cluster the spots appropriately.

FAQs

Is OPTICS better than DBSCAN? OPTICS functions as an add-on to DBSCAN. The main difference is that it records the order in which the points are handled rather than assigning cluster memberships.

What is reachability distance in OPTICS? The reachability distance of an object p with respect to another object q is the smallest distance from q if q is a core object. It also cannot be smaller than the core distance of q.

What is clustering in ML? Cluster analysis, often known as clustering, is a process that requires unsupervised machine learning. It entails finding natural grouping in data automatically. Clustering algorithms, unlike supervised learning (such as predictive modelling), just evaluate the incoming data and look for natural groupings or clusters in feature space.

Key Takeaways

In this article, we have discussed the following topics:

Introduction to OPTICS algorithm

Core distance and reachability distance

Pseudocode for OPTICS

OPTICS vs. DBSCAN

Want to learn more about Machine Learning? Here is an excellent course that can guide you in learning. Happy Coding