Table of contents
1.
Introduction
2.
Types of Data Compression
3.
What is Shannon Fano Coding?
4.
Working of Shannon Fano Coding Algorithm
5.
Example to understand Shannon Fano coding
6.
Code Implementation
6.1.
C++
7.
Differences between Huffman and Shannon-Fano Algorithm
8.
Frequently Asked Questions
8.1.
What is the Worst Time Complexity of Shanon Fano Algorithm?
8.2.
What is the Best Time Complexity of Shanon Fano Algorithm?
8.3.
How to Construct a Shannon Code?
9.
Conclusion
Last Updated: May 20, 2024
Medium

Shannon Fano Coding

Author Pallavi singh
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Hello Ninja. Have you ever wondered how large the size of data is being transmitted over the internet with so ease? Do you know how the size-reducer apps or websites work which save on space when you are running out of storage? 

This is all made possible due to data compression, which reduces the amount of space required to ship or store data. This article will examine one of the earliest and most effective data compression techniques, Shannon Fano Coding.

shannon fano coding

Types of Data Compression

There are two types of data compression techniques: Lossless and Lossy. Compressing the size of files is done using these techniques. Let's have a look at each one of the techniques used in data compression.

  • Lossless Compression: In this compression technique, the original data can be restored entirely without losing any data or information. Some standard lossless compression methods are Huffman coding, Arithmetic coding, and Lempel-Ziv-Welch (LZW) compression.
lossless block diagram

 

  • Lossy Compression: In this compression technique, some of the data is lost during the compression, but its advantage is that it results in a smaller file size. Lossy compression is majorly used for media files such as images, audio and video. Some standard lossy compression techniques are JPEG for photos, MPEG for video, and MP3 for audio.
lossy block diagram

What is Shannon Fano Coding?

Shannon Fano Coding is a lossless data compression technique/algorithm that Claude Shannon and Robert Fano developed in the late 1940s. The main of data compression is to reduce the data size without losing information so that we can store and transfer it more efficiently.

In Shannon Fano coding, the idea is to assign the binary codes to symbols according to their frequency of occurrence. For example, if the symbol ‘C’ occurs more than the symbol ‘B’, then we assign the symbol ‘B’ the shorter binary code than ‘C’.

Working of Shannon Fano Coding Algorithm

The algorithm's steps are as follows:

  1. Calculate the number of times each symbol appears and then find out the probability of each symbol by dividing it by the total number of symbols.
     
  2. Now, Sort the symbols in decreasing order of their probability.
     
  3. Divide the symbols into two subparts, with the sum of probabilities in each part being as close to each other as possible. 
     
  4. Assign the value  '0' to the first subpart and '1' to the second subpart.
     
  5. Repeat steps 3 and 4 for each subpart until each symbol is it own.
     

Ensure the probabilities are accurate; otherwise, the resulting binary code may not be optimal for compression.

Example to understand Shannon Fano coding

In this section, we will implement the Shannon Fano coding algorithm.

Let ABCDE a Symbol with the Count of each as A=12, B=4, C=3, D=3, E=2.

Steps 1 and 2 are being implemented

SymbolCountProbability
A120.48
B40.16
C30.12
D30.12
E20.08


Step 3 is implemented in the below diagram

diagram


Steps 4 and 5 are implemented in the below diagrams

diagram
diagram

 


After applying Step 4, the Code we get for each are:

SymbolCountProbabilityCode
A120.480
B40.16100
C3012101
D30.12110
E20.08111

Code Implementation

Implementation of the above approach:

  • C++

C++

#include <bits/stdc++.h>
using namespace std;

struct node {
// storing symbol
string sym;
//  storing probability or frequency
float pro;
int arr[20];
int top;
} p[20];


typedef struct node;

// function to find shannon code
void shannon(int l, int h, node p[])
{
float pack1 = 0, pack2 = 0, diff1 = 0, diff2 = 0;
int i, d, k, j;
if ((l + 1) == h || l == h || l > h) {
if (l == h || l > h)
return;
p[h].arr[++(p[h].top)] = 0;
p[l].arr[++(p[l].top)] = 1;
return;
}
else {
for (i = l; i <= h - 1; i++)
pack1 = pack1 + p[i].pro;
pack2 = pack2 + p[h].pro;
diff1 = pack1 - pack2;
if (diff1 < 0)
diff1 = diff1 * -1;
j = 2;
while (j != h - l + 1) {
k = h - j;
pack1 = pack2 = 0;
for (i = l; i <= k; i++)
pack1 = pack1 + p[i].pro;
for (i = h; i > k; i--)
pack2 = pack2 + p[i].pro;
diff2 = pack1 - pack2;
if (diff2 < 0)
diff2 = diff2 * -1;
if (diff2 >= diff1)
break;
diff1 = diff2;
j++;
}
k++;
for (i = l; i <= k; i++)
p[i].arr[++(p[i].top)] = 1;
for (i = k + 1; i <= h; i++)
p[i].arr[++(p[i].top)] = 0;

// Invoke shannon function
shannon(l, k, p);
shannon(k + 1, h, p);
}
}

// Function to sort the symbols
// based on their probability or frequency
void sortByProbability(int n, node p[])
{
int i, j;
node temp;
for (j = 1; j <= n - 1; j++) {
for (i = 0; i < n - 1; i++) {
if ((p[i].pro) > (p[i + 1].pro)) {
temp.pro = p[i].pro;
temp.sym = p[i].sym;


p[i].pro = p[i + 1].pro;
p[i].sym = p[i + 1].sym;


p[i + 1].pro = temp.pro;
p[i + 1].sym = temp.sym;
}
}
}
}

// function to display Shannon codes
void display(int n, node p[])
{
int i, j;
cout << "\n\n\n\tSymbol\tProbability\tCode";
for (i = n - 1; i >= 0; i--) {
cout << "\n\t" << p[i].sym << "\t\t" << p[i].pro << "\t";
for (j = 0; j <= p[i].top; j++)
cout << p[i].arr[j];
}
}

// Driver code
int main()
{
int n, i, j;
float total = 0;
string ch;
node temp;


// Input the number of symbols
cout << "Enter number of symbols\t: ";
n = 5;
cout << n << endl;


// Input symbols
for (i = 0; i < n; i++) {
cout << "Enter symbol " << i + 1 << " : ";
ch = (char)(65 + i);
cout << ch << endl;


// Insert the symbol to the node
p[i].sym += ch;
}

// Input probability of symbols
float x[] = { 0.48, 0.16, 0.12, 0.12, 0.08 };
for (i = 0; i < n; i++) {
cout << "\nEnter probability of " << p[i].sym << " : ";
cout << x[i] << endl;


// Insert the value to node
p[i].pro = x[i];
total = total + p[i].pro;


// checking max probability
if (total > 1) {
cout << "Invalid. Enter new values";
total = total - p[i].pro;
i--;
}
}

p[i].pro = 1 - total;

// Sorting the symbols based on
// their probability or frequency
sortByProbability(n, p);

for (i = 0; i < n; i++)
p[i].top = -1;

// Find the Shannon code
shannon(0, n - 1, p);

// Display the codes
display(n, p);
return 0;
}
You can also try this code with Online C++ Compiler
Run Code

Output

Output of Shannon fano Coding

Differences between Huffman and Shannon-Fano Algorithm

FeatureHuffman AlgorithmShannon-Fano Algorithm
Basis of ConstructionConstructed based on frequency of symbols using a binary tree with minimum weighted path lengths.Constructed based on dividing symbols into two sets with approximately equal probabilities.
EfficiencyGenerally more efficient, providing optimal prefix codes.Slightly less efficient, may not always produce the most optimal prefix codes.
ComplexityHigher complexity due to the need for sorting and tree construction.Lower complexity, involves simpler steps of dividing and assigning codes.
Code LengthsProduces variable-length codes that are usually shorter.Produces variable-length codes, but may be longer compared to Huffman codes.
ImplementationMore complex to implement due to the necessity of constructing and traversing the tree.Easier to implement with simpler and more straightforward steps.
OptimalityAlways produces the most optimal prefix-free code (minimum redundancy).Does not always guarantee optimal codes and may have slight redundancy.
Use CasesWidely used in file compression algorithms like ZIP and JPEG.Less commonly used, often replaced by Huffman coding in practical applications.
Historical BackgroundDeveloped by David A. Huffman in 1952.Developed by Claude Shannon and Robert Fano in 1949.

Frequently Asked Questions

What is the Worst Time Complexity of Shanon Fano Algorithm?

The partitions carried out each time may be extremely unbalanced; for instance, if the probabilities are 0.06, 0.1, 0.11, 0.2, and 0.4, the recurrence relation is: 

T(n)=T(n-1)+T(1)+Θ(n)

T(n) in this case turns out to be O(n2).

What is the Best Time Complexity of Shanon Fano Algorithm?

If the partitions can be divided in such a way that their sizes are nearly equal, for instance, if the probabilities are 0.3,0.55,0.55,0.4, the recurrence relation is: 

T(n)=2T(n/2)+Θ(n)

T(n) in this case turns out to be O(n*logn).

How to Construct a Shannon Code?

To construct a Shannon code, divide symbols into two sets with equal probabilities, then assign binary digits recursively to each set based on their cumulative probabilities.

Conclusion

Congratulations on finishing this blog!! This article has discussed the types of compression and looked at the working of Shannon fano coding with its implementation with the help of diagrams and later with the code. At last, we discussed some FAQs.

Hope you enjoyed reading this article, here are some more related articles:

Check out The Interview Guide for Product Based Companies and some famous Interview Problems from Top Companies, like AmazonAdobeGoogle, etc., on Coding Ninjas Studio.
Also, check out some of the Guided Paths on topics such as Data Structure and AlgorithmsCompetitive ProgrammingOperating SystemsComputer Networks, DBMSSystem Design, etc., as well as some Contests, Test SeriesInterview Bundles, and some Interview Experiences curated by top Industry Experts only on Coding Ninjas Studio.

We hope you liked reading this article. Please give this article a thumbs up if you enjoyed it. 

"Have fun coding!"

Live masterclass