## Introduction

The Flajolet-Martin algorithm is an intelligent way to estimate the number of distinct elements in a large dataset. It's a probabilistic algorithm that uses hash functions to create a compact summary of the data. This summary, called a sketch, allows us to quickly estimate the cardinality (the number of unique items) without having to store every single element.

In this article, we'll learn how the Flajolet-Martin algorithm works, look at the steps involved, & see how to implement it in Python. We'll also discuss the advantages & limitations of this algorithm.

## What is flajolet martin algorithm ?

The Flajolet-Martin algorithm is a probabilistic algorithm used to estimate the number of distinct elements (cardinality) in a large dataset using a limited amount of memory. It works by applying hash functions to the elements & maintaining a compact summary called a bitmap. By analyzing the bitmap, the algorithm provides an approximate count of the unique elements in the dataset without the need to store or compare each individual item.

In simpler terms, the Flajolet-Martin algorithm is a clever way to count the number of unique items in a huge collection of data without actually counting them one by one. It trades a bit of accuracy for efficiency, allowing us to get a good estimate of the cardinality even when dealing with massive datasets that wouldn't fit in memory.

The algorithm relies on the properties of hash functions & probability theory to create a compact representation of the data. By setting bits in the bitmap based on the hash values of the elements, it captures the essence of the dataset's uniqueness. The position of the least significant bit in the bitmap is then used to estimate the total number of distinct elements.

The steps for the Flajolet-Martin algorithm are:

1. Initialize an empty bitmap of size k, where k is a power of 2.

2. For each element in the dataset, apply a hash function that maps the element to a number between 0 & 2^k - 1.

3. Find the position of the least significant bit (LSB) that is set to 1 in the hash value.

4. Update the bitmap by setting the bit at the position found in step 3 to 1.

5. After processing all elements, estimate the number of distinct elements by counting the number of zeros (r) from the right until the first 1 is encountered in the bitmap.

6. Calculate the estimate using the formula: 2^(k - r) / φ, where φ is a constant (approximately 0.77351).

The key idea behind the Flajolet-Martin algorithm is that the position of the LSB in the hash values follows a geometric distribution. By analyzing the bitmap & applying the estimation formula, we can get a good approximation of the number of distinct elements in the dataset.