Table of contents
1.
Introduction
2.
Core Distance
2.1.
Core Distance of an object P
3.
Reachability Distance
3.1.
Calculating Reachability Distance Between Two Points
4.
Pseudo Code For OPTICS Clustering
5.
OPTICS vs DBSCAN
6.
FAQs
7.
Key Takeaways
Last Updated: Mar 27, 2024

OPTICS Clustering

Author soham Medewar
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

OPTICS stands for Ordering Points To Identify Cluster Structure. The OPTICS algorithm draws inspiration from the DBSCAN clustering algorithm. The difference ‘is  DBSCAN algorithm assumes the density of the clusters as constant, whereas the OPTICS algorithm allows a varying density of the clusters.

OPTICS adds two more terms to the concept of the DBSCAN algorithm, i.e.:

  • Core Distance
  • Reachability Distance

Core Distance

It is the minimum value of radius required to classify a given point as a core point. If the given point is not the core point, then its core distance is not defined.

Core Distance of an object P

The core Distance of an object P is the smallest value of ɛ such that ɛ' neighborhood of P has at least MinPts objects.

For example, consider the following figure. The point p is displayed in the figure. ɛ has a value of 6, and MinPts is 5. We have to find the core distance of point p. According to the definition, the minimum value of radius that satisfies the MinPts criteria is called core distance. Now, if we consider the core distance of 3mm, i.e., the radius of 3, the criteria of MinPts is fulfilled.

source

Reachability Distance

It's determined in relation to another data item, q(Let). The maximum of the Euclidean Distance(or any other distance metric) between two points p and q, and the Core Distance of p is the Reachability distance between p and q. If q is not a Core point, the Reachability Distance is not specified.

Calculating Reachability Distance Between Two Points

First, we will calculate the reachability distance between p and q. In the figure below, we can see that point q is outside the core distance of point p, so to calculate the reachability distance between p and q is the euclidean distance between p and q, i.e., 7mm.

The reachability distance between p and r is the core distance of point p, as r lies within the core distance. Hence the reachability distance between p and r will be 3mm.

source

Pseudo Code For OPTICS Clustering

The basic code for OPTICS is similar to the DBSCAN algorithm. 

function OPTICS(DB, epsilon, MinimumnPts) is
    for each point x of DB do
        x.reachability_distance = UNDEFINED
    for each unprocessed point x of DB do
        N = getNeighbors(x, epsilon)
        mark x as processed
        output x to the ordered list
        if core_distance(x, epsilon, MinimumnPts) != UNDEFINED then
            Seeds = empty priority queue
            update(N, x, Seeds, epsilon, MinimumnPts)
            for each next q in Seeds do
                N' = getNeighbors(q, epsilon)
                mark q as processed
                output q to the ordered list
                if core_distance(q, epsilon, MinimumnPts) != UNDEFINED do
                    update(N', q, Seeds, epsilon, MinimumnPts)

Pseudo code for update function.

function update(N, p, Seeds, epsilon, MinimumPts) is
    coredist = core_distance(p, epsilon, MinimumPts)
    for each o in N
        if o is not processed then
            new_reach_dist = max(coredist, dist(p,o))
            if o.reachability_distance == UNDEFINED then // o is not in Seeds
                o.reachability_distance = new_reach_dist
                Seeds.insert(o, new_reach_dist)
            else               // o in Seeds, check for improvement
                if new_reach_dist < o.reachability_distance then
                    o.reachability_distance = new_reach_dist
                    Seeds.move-up(o, new_reach_dist)

OPTICS vs DBSCAN

  • Memory Consumption: The OPTICS clustering algorithm uses more memory because it uses a priority queue (Min Heap) to find the next data point that is closest in terms of Reachability Distance to the one currently being processed. It also needs greater processing capacity since DBSCAN's closest neighbor searches are more complex than radius queries.
     
  • Fewer variables: The epsilon parameter is not required for the OPTICS clustering approach, and it is just used in the pseudo-code above to save time. As a result, the analytical process of parameter adjustment is reduced.
     
  • This method does not divide the input data into clusters. It just generates a Reachability distance plot, which the programmer must read in order to cluster the spots appropriately.

FAQs

  1. Is OPTICS better than DBSCAN?
    OPTICS functions as an add-on to DBSCAN. The main difference is that it records the order in which the points are handled rather than assigning cluster memberships.
     
  2. What is reachability distance in OPTICS?
    The reachability distance of an object p with respect to another object q is the smallest distance from q if q is a core object. It also cannot be smaller than the core distance of q.
     
  3. What is clustering in ML?
    Cluster analysis, often known as clustering, is a process that requires unsupervised machine learning. It entails finding natural grouping in data automatically. Clustering algorithms, unlike supervised learning (such as predictive modelling), just evaluate the incoming data and look for natural groupings or clusters in feature space.

Key Takeaways

In this article, we have discussed the following topics:

  • Introduction to OPTICS algorithm
  • Core distance and reachability distance
  • Pseudocode for OPTICS
  • OPTICS vs. DBSCAN

Want to learn more about Machine Learning? Here is an excellent course that can guide you in learning. 
Happy Coding

Live masterclass