Hello Ninja. Have you ever wondered how large the size of data is being transmitted over the internet with so ease? Do you know how the size-reducer apps or websites work which save on space when you are running out of storage?

This is all made possible due to data compression, which reduces the amount of space required to ship or store data. This article will examine one of the earliest and most effective data compression techniques, Shannon Fano Coding.

Types of Data Compression

There are two types of data compression techniques: Lossless and Lossy. Compressing the size of files is done using these techniques. Let's have a look at each one of the techniques used in data compression.

Lossless Compression: In this compression technique, the original data can be restored entirely without losing any data or information. Some standard lossless compression methods are Huffman coding, Arithmetic coding, and Lempel-Ziv-Welch (LZW) compression.

Lossy Compression: In this compression technique, some of the data is lost during the compression, but its advantage is that it results in a smaller file size. Lossy compression is majorly used for media files such as images, audio and video. Some standard lossy compression techniques are JPEG for photos, MPEG for video, and MP3 for audio.

What is Shannon Fano Coding?

Shannon Fano Coding is a lossless data compression technique/algorithm that Claude Shannon and Robert Fano developed in the late 1940s. The main of data compression is to reduce the data size without losing information so that we can store and transfer it more efficiently.

In Shannon Fano coding, the idea is to assign the binary codes to symbols according to their frequency of occurrence. For example, if the symbol â€˜Câ€™ occurs more than the symbol â€˜Bâ€™, then we assign the symbol â€˜Bâ€™ the shorter binary code than â€˜Câ€™.

Working of Shannon Fano Coding Algorithm

The algorithm's steps are as follows:

Calculate the number of times each symbol appears and then find out the probability of each symbol by dividing it by the total number of symbols.

Now, Sort the symbols in decreasing order of their probability.

Divide the symbols into two subparts, with the sum of probabilities in each part being as close to each other as possible.

Assign the value '0' to the first subpart and '1' to the second subpart.

Repeat steps 3 and 4 for each subpart until each symbol is it own.

Ensure the probabilities are accurate; otherwise, the resulting binary code may not be optimal for compression.

Example to understand Shannon Fano coding

In this section, we will implement the Shannon Fano coding algorithm.

Let ABCDE a Symbol with the Count of each as A=12, B=4, C=3, D=3, E=2.

Steps 1 and 2 are being implemented

Symbol

Count

Probability

A

12

0.48

B

4

0.16

C

3

0.12

D

3

0.12

E

2

0.08

Step 3 is implemented in the below diagram

Steps 4 and 5 are implemented in the below diagrams

After applying Step 4, the Code we get for each are:

Symbol

Count

Probability

Code

A

12

0.48

0

B

4

0.16

100

C

3

012

101

D

3

0.12

110

E

2

0.08

111

Code Implementation

Implementation of the above approach:

C++

C++

#include <bits/stdc++.h> using namespace std;

struct node { // storing symbol string sym; // storing probability or frequency float pro; int arr[20]; int top; } p[20];

typedef struct node;

// function to find shannon code void shannon(int l, int h, node p[]) { float pack1 = 0, pack2 = 0, diff1 = 0, diff2 = 0; int i, d, k, j; if ((l + 1) == h || l == h || l > h) { if (l == h || l > h) return; p[h].arr[++(p[h].top)] = 0; p[l].arr[++(p[l].top)] = 1; return; } else { for (i = l; i <= h - 1; i++) pack1 = pack1 + p[i].pro; pack2 = pack2 + p[h].pro; diff1 = pack1 - pack2; if (diff1 < 0) diff1 = diff1 * -1; j = 2; while (j != h - l + 1) { k = h - j; pack1 = pack2 = 0; for (i = l; i <= k; i++) pack1 = pack1 + p[i].pro; for (i = h; i > k; i--) pack2 = pack2 + p[i].pro; diff2 = pack1 - pack2; if (diff2 < 0) diff2 = diff2 * -1; if (diff2 >= diff1) break; diff1 = diff2; j++; } k++; for (i = l; i <= k; i++) p[i].arr[++(p[i].top)] = 1; for (i = k + 1; i <= h; i++) p[i].arr[++(p[i].top)] = 0;

// Function to sort the symbols // based on their probability or frequency void sortByProbability(int n, node p[]) { int i, j; node temp; for (j = 1; j <= n - 1; j++) { for (i = 0; i < n - 1; i++) { if ((p[i].pro) > (p[i + 1].pro)) { temp.pro = p[i].pro; temp.sym = p[i].sym;

// function to display Shannon codes void display(int n, node p[]) { int i, j; cout << "\n\n\n\tSymbol\tProbability\tCode"; for (i = n - 1; i >= 0; i--) { cout << "\n\t" << p[i].sym << "\t\t" << p[i].pro << "\t"; for (j = 0; j <= p[i].top; j++) cout << p[i].arr[j]; } }

// Driver code int main() { int n, i, j; float total = 0; string ch; node temp;

// Input the number of symbols cout << "Enter number of symbols\t: "; n = 5; cout << n << endl;

// Input symbols for (i = 0; i < n; i++) { cout << "Enter symbol " << i + 1 << " : "; ch = (char)(65 + i); cout << ch << endl;

// Insert the symbol to the node p[i].sym += ch; }

// Input probability of symbols float x[] = { 0.48, 0.16, 0.12, 0.12, 0.08 }; for (i = 0; i < n; i++) { cout << "\nEnter probability of " << p[i].sym << " : "; cout << x[i] << endl;

// Insert the value to node p[i].pro = x[i]; total = total + p[i].pro;

// checking max probability if (total > 1) { cout << "Invalid. Enter new values"; total = total - p[i].pro; i--; } }

p[i].pro = 1 - total;

// Sorting the symbols based on // their probability or frequency sortByProbability(n, p);

for (i = 0; i < n; i++) p[i].top = -1;

// Find the Shannon code shannon(0, n - 1, p);

// Display the codes display(n, p); return 0; }

Output

Differences between Huffman and Shannon-Fano Algorithm

Feature

Huffman Algorithm

Shannon-Fano Algorithm

Basis of Construction

Constructed based on frequency of symbols using a binary tree with minimum weighted path lengths.

Constructed based on dividing symbols into two sets with approximately equal probabilities.

Efficiency

Generally more efficient, providing optimal prefix codes.

Slightly less efficient, may not always produce the most optimal prefix codes.

Complexity

Higher complexity due to the need for sorting and tree construction.

Lower complexity, involves simpler steps of dividing and assigning codes.

Code Lengths

Produces variable-length codes that are usually shorter.

Produces variable-length codes, but may be longer compared to Huffman codes.

Implementation

More complex to implement due to the necessity of constructing and traversing the tree.

Easier to implement with simpler and more straightforward steps.

Optimality

Always produces the most optimal prefix-free code (minimum redundancy).

Does not always guarantee optimal codes and may have slight redundancy.

Use Cases

Widely used in file compression algorithms like ZIP and JPEG.

Less commonly used, often replaced by Huffman coding in practical applications.

Historical Background

Developed by David A. Huffman in 1952.

Developed by Claude Shannon and Robert Fano in 1949.

Frequently Asked Questions

What is the Worst Time Complexity of Shanon Fano Algorithm?

The partitions carried out each time may be extremely unbalanced; for instance, if the probabilities are 0.06, 0.1, 0.11, 0.2, and 0.4, the recurrence relation is:

T(n)=T(n-1)+T(1)+Î˜(n)

T(n) in this case turns out to be O(n2).

What is the Best Time Complexity of Shanon Fano Algorithm?

If the partitions can be divided in such a way that their sizes are nearly equal, for instance, if the probabilities are 0.3,0.55,0.55,0.4, the recurrence relation is:

T(n)=2T(n/2)+Î˜(n)

T(n) in this case turns out to be O(n*logn).

How to Construct a Shannon Code?

To construct a Shannon code, divide symbols into two sets with equal probabilities, then assign binary digits recursively to each set based on their cumulative probabilities.

Conclusion

Congratulations on finishing this blog!! This article has discussed the types of compression and looked at the working of Shannon fano coding with its implementation with the help of diagrams and later with the code. At last, we discussed some FAQs.

Hope you enjoyed reading this article, here are some more related articles: