Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Shannon-Fano Coding is a lossless data compression algorithm used to efficiently encode data by assigning variable-length binary codes to symbols based on their probability of occurrence. Developed by Claude Shannon and Robert Fano, this technique is widely used in information theory and data compression. The algorithm helps minimize the overall length of encoded messages, making data storage and transmission more efficient. In this blog, we will explore the Shannon-Fano Coding algorithm.
Types of Data Compression
There are two types of data compression techniques: Lossless and Lossy. Compressing the size of files is done using these techniques. Let's have a look at each one of the techniques used in data compression.
Lossless Compression: In this compression technique, the original data can be restored entirely without losing any data or information. Some standard lossless compression methods are Huffman coding, Arithmetic coding, and Lempel-Ziv-Welch (LZW) compression.
Lossy Compression: In this compression technique, some of the data is lost during the compression, but its advantage is that it results in a smaller file size. Lossy compression is majorly used for media files such as images, audio and video. Some standard lossy compression techniques are JPEG for photos, MPEG for video, and MP3 for audio.
What is Shannon-Fano Coding?
Shannon Fano Coding is a lossless data compression technique/algorithm that Claude Shannon and Robert Fano developed in the late 1940s. The main of data compression is to reduce the data size without losing information so that we can store and transfer it more efficiently.
In Shannon Fano coding, the idea is to assign the binary codes to symbols according to their frequency of occurrence. For example, if the symbol ‘C’ occurs more than the symbol ‘B’, then we assign the symbol ‘B’ the shorter binary code than ‘C’.
Working on the Shannon-Fano Coding Algorithm
The algorithm's steps are as follows:
Calculate the number of times each symbol appears and then find out the probability of each symbol by dividing it by the total number of symbols.
Now, Sort the symbols in decreasing order of their probability.
Divide the symbols into two subparts, with the sum of probabilities in each part being as close to each other as possible.
Assign the value '0' to the first subpart and '1' to the second subpart.
Repeat steps 3 and 4 for each subpart until each symbol is it own.
Ensure the probabilities are accurate; otherwise, the resulting binary code may not be optimal for compression.
Example to understand Shannon-Fano coding
In this section, we will implement the Shannon Fano coding algorithm.
Let ABCDE a Symbol with the Count of each as A=12, B=4, C=3, D=3, E=2.
Steps 1 and 2 are being implemented
Symbol
Count
Probability
A
12
0.48
B
4
0.16
C
3
0.12
D
3
0.12
E
2
0.08
Step 3 is implemented in the below diagram
Steps 4 and 5 are implemented in the below diagrams
After applying Step 4, the Code we get for each are:
Symbol
Count
Probability
Code
A
12
0.48
0
B
4
0.16
100
C
3
012
101
D
3
0.12
110
E
2
0.08
111
Implementation
Implementation of the above approach:
C
C++
Java
Python
JS
C#
C
#include <stdio.h> #include <string.h>
struct Node { char sym; float prob; int arr[20]; int top; };
void shannon(int l, int h, struct Node p[]) { if (l == h) return;
float total = 0, sum = 0; for (int i = l; i <= h; i++) { total += p[i].prob; }
int mid = l; while (sum + p[mid].prob <= total / 2 && mid < h) { sum += p[mid].prob; mid++; }
for (int i = l; i <= mid; i++) { p[i].arr[++(p[i].top)] = 1; } for (int i = mid + 1; i <= h; i++) { p[i].arr[++(p[i].top)] = 0; }
shannon(l, mid, p); shannon(mid + 1, h, p); }
void display(int n, struct Node p[]) { printf("\nSymbol\tProbability\tCode\n"); for (int i = 0; i < n; i++) { printf("%c\t%.2f\t\t", p[i].sym, p[i].prob); for (int j = 0; j <= p[i].top; j++) { printf("%d", p[i].arr[j]); } printf("\n"); } }
// Function to sort the symbols // based on their probability or frequency void sortByProbability(int n, node p[]) { int i, j; node temp; for (j = 1; j <= n - 1; j++) { for (i = 0; i < n - 1; i++) { if ((p[i].pro) > (p[i + 1].pro)) { temp.pro = p[i].pro; temp.sym = p[i].sym;
public static void main(String[] args) { List<Node> nodes = Arrays.asList( new Node("A", 0.48), new Node("B", 0.16), new Node("C", 0.12), new Node("D", 0.12), new Node("E", 0.08) );
Differences between Huffman and Shannon-Fano Algorithm
Feature
Huffman Algorithm
Shannon-Fano Algorithm
Basis of Construction
Constructed based on frequency of symbols using a binary tree with minimum weighted path lengths.
Constructed based on dividing symbols into two sets with approximately equal probabilities.
Efficiency
Generally more efficient, providing optimal prefix codes.
Slightly less efficient, may not always produce the most optimal prefix codes.
Complexity
Higher complexity due to the need for sorting and tree construction.
Lower complexity, involves simpler steps of dividing and assigning codes.
Code Lengths
Produces variable-length codes that are usually shorter.
Produces variable-length codes, but may be longer compared to Huffman codes.
Implementation
More complex to implement due to the necessity of constructing and traversing the tree.
Easier to implement with simpler and more straightforward steps.
Optimality
Always produces the most optimal prefix-free code (minimum redundancy).
Does not always guarantee optimal codes and may have slight redundancy.
Use Cases
Widely used in file compression algorithms like ZIP and JPEG.
Less commonly used, often replaced by Huffman coding in practical applications.
Historical Background
Developed by David A. Huffman in 1952.
Developed by Claude Shannon and Robert Fano in 1949.
Frequently Asked Questions
What is the Worst Time Complexity of the Shannon-Fano Algorithm?
The partitions carried out each time may be extremely unbalanced; for instance, if the probabilities are 0.06, 0.1, 0.11, 0.2, and 0.4, the recurrence relation is:
T(n)=T(n-1)+T(1)+Θ(n)
T(n) in this case turns out to be O(n2).
What is the Best Time Complexity of the Shannon-Fano Algorithm?
If the partitions can be divided in such a way that their sizes are nearly equal, for instance, if the probabilities are 0.3,0.55,0.55,0.4, the recurrence relation is:
T(n)=2T(n/2)+Θ(n)
T(n) in this case turns out to be O(n*logn).
How to Construct a Shannon Code?
To construct a Shannon code, divide symbols into two sets with equal probabilities, then assign binary digits recursively to each set based on their cumulative probabilities.
Conclusion
Congratulations on finishing this blog!! This article has discussed the types of compression and looked at the working of Shannon fano coding with its implementation with the help of diagrams and later with the code. At last, we discussed some FAQs.
Hope you enjoyed reading this article, here are some more related articles: