The Speeded Up Robust Features algorithm was developed by Bay, H., Tuytelaars, T. and Van Gool, L, from the motivation to overcome the limitations and shortcomings of the Scale-Invariant Feature Transform. This new feature is fast and robust compared to SIFT and can be used efficiently to compare images and represent local, similarity invariant features. The primary focus of the SURF algorithm is the fast computation of operators using box filters, which enables real-time applications for applications such as object detection and recognition, as well as object tracking.

Speeded Up Robust Feature - SURF

Speeded up robust feature (SURF) is a proprietary local feature detector and descriptor in computer vision. It can perform tasks like object identification, picture registration, categorization, and 3D reconstruction. SURF's standard version is several times faster than SIFT, and its inventors claim that it is more resilient against picture modifications.

SURF uses an integer approximation of the determinant of the Hessian blob detector to detect interest points, which may be computed in three integer operations using a precomputed integral image. The total of the Haar wavelet response around the point of interest is used to create its feature descriptor.

SURF descriptors have been used to find and recognize items, persons, and faces, recreate 3D scenes, track objects, and extract points of interest, among other things.

Integral Image

The total of all pixels in I(x,y) of a rectangular region created by (0,0) and (x,y) is represented by the integral image I(x,y) of an image I(x, y). Interest points to create feature descriptors are calculated using the integral image.

It just needs four array references to calculate the total pixels over a rectangular region of any size when using integral images.

S=A-B-C+D

SURF is just composed of two steps: 1. Feature Extraction and 2. Feature Description. They are discussed below.

Feature Extraction

The method for detecting interest points is based on a simple Hessian matrix approximation.

Surf uses the Hessian matrix because of its fast computing time and precision. Surf uses the determinant of the Hessian matrix for both location and scale selection rather than a separate measure (Hessian-Laplace detector). Given a pixel, the Hessian of this pixel is something like:

We filtered the image with a Gaussian kernel to adapt to any scale; therefore, given a point, X = (x, y), the Hessian matrix H(x,Ïƒ) in x at scale Ïƒ is defined as:

In the images above, the 9 x 9 box filters are approximations for Gaussian second-order derivatives with Ïƒ equal to 1.2. Dxx, Dyy, and Dxy are the abbreviations for these approximations. The Hessian determinant (approximated) can now be written as:

Image pyramids are commonly used to implement scale-spaces. The images are repeatedly smoothed with a Gaussian and then sub-sampled to achieve a higher pyramid level. Surf does not have to iteratively apply the same filter to the output of a previously filtered layer because of box filters and integral pictures; instead, it can apply such filters of any size at precisely the same speed directly on the original image even simultaneously. As a result, the scale space is examined by upscaling the filter size rather than iteratively reducing the image size.

As a result, for each additional octave, the filter size is twice, and the sample intervals for extracting the interest points() can also be doubled, allowing for filter upscaling at constant cost. A non-maximum suppression in a 3 x 3 x 3 neighbourhood is done to pinpoint interest points in the image and over scales.

There are two phases to creating a SURF description. The first stage is establishing a repeatable orientation using a circular region surrounding the critical point. The SURF descriptor is extracted from a square region aligned to the specified orientation.

Orientation Assignment

Surf seeks to find a reproducible orientation for the interest points to be rotation invariant. To accomplish this, use the following methods:

Surf computes the Haar-wavelet responses in the x and y directions in a circular neighbourhood of radius 6s around the critical point, where s is the scale at which the critical point was detected. In addition, the sampling step is scale-dependent and set to s, and the wavelet responses are generated at that scale. As a result, the wavelets are large at large scales. As a result, integral images are used for quick filtering once more.

Then we add the sums of vertical and horizontal wavelet responses in a scanning area, alter the scanning orientation (add /3), and recalculate until we find the orientation with the highest sum value, which is the primary orientation of the feature descriptor.

Descriptor Components

The first step is to create a square region centered on the keypoint and orientated earlier. This window is 20s in size.

The territory is then divided into smaller 4 x 4 square sub-regions regularly. At 5x5 regularly spaced sample points, we compute a few simple features for each sub-region. The Haar wavelet response in the horizontal direction is referred to as dx, and the Haar wavelet response in the vertical direction is referred to as dy (filter size 2s). The answers dx and dy are first weighted using a Gaussian (Ïƒ= 3.3s) centered at the key point to strengthen their resilience against geometric deformations and localization mistakes.

The wavelet responses dx and dy are then added together over each subregion to generate an initial set of feature vector entries. We additionally extract the sum of the absolute values of the responses, |dx| and |dy|, to bring in information about the polarity of the intensity changes. As a result, each sub-underlying region's intensity structure V = (âˆ‘dx, âˆ‘dy,âˆ‘|dx|,âˆ‘|dy|) contains a four-dimensional vector v. This yields a 64-bit descriptor vector for all 44 sub-regions. (Our descriptor in Sift in the 128-D vector is one reason SURF is faster than Sift.)

Steps for implementing SURF Algorithm

The following steps are followed to implement the SURF Algorithm on an image:

Prepare the image by preferably converting it to RGB format.

Add Scale Invariance and Rotational Invariance to the test image.

Detect keypoints by following feature extraction and using integral images & scale-space representation.

Create a SURF descriptor by following feature description and fixing a reproducible orientation based on information from a circular region around the keypoint. Then, construct a square region aligned to the selected orientation and extract the SURF descriptor from it.

Perform matching between SURF descriptors of training and test images; the matches with shorter distances are the ones we want.

The Mat class in OpenCV is mainly used to store and retrieve picture information. This class represents an n-dimensional array and stores data like grayscale or colour images, vector fields, point clouds, and voxel volumes, among other things. Mat stands for matrix and is part of the OpenCV namespace. They are represented in a certain way by a type number.

How do you calculate the degree of similarity between two images?

The degree of similarity between two images is measured using a distance metric established on the appropriate multi-dimensional feature space. The Minkowski distance, the Manhattan distance, the Euclidean distance, and the Hausdorff distance are standard distance metrics.

To blur an image, can you use a linear filter?

In a filter, blurring compares and smooths neighbouring pixels. You can't use a linear filter for this.

Conclusion

In this blog we discussed the Speeded Up Robust Features (SURF) followed by the explanation for an integral image. We also covered feature extraction and feature description and discussed the steps for implementing SURF Algorithm. Refer to our guided paths on Coding Ninjas Studio to learn more about DSA, Competitive Programming, JavaScript, System Design, etc. Enroll in our courses, refer to the mock test and problems; look at the interview experiences and interview bundle for placement preparations.