Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Hello Ninja. Have you ever wondered how large the size of data is being transmitted over the internet with so ease? Do you know how the size-reducer apps or websites work which save on space when you are running out of storage?
This is all made possible due to data compression, which reduces the amount of space required to ship or store data. This article will examine one of the earliest and most effective data compression techniques, Shannon Fano Coding.
Types of Data Compression
There are two types of data compression techniques: Lossless and Lossy. Compressing the size of files is done using these techniques. Let's have a look at each one of the techniques used in data compression.
Lossless Compression: In this compression technique, the original data can be restored entirely without losing any data or information. Some standard lossless compression methods are Huffman coding, Arithmetic coding, and Lempel-Ziv-Welch (LZW) compression.
Lossy Compression: In this compression technique, some of the data is lost during the compression, but its advantage is that it results in a smaller file size. Lossy compression is majorly used for media files such as images, audio and video. Some standard lossy compression techniques are JPEG for photos, MPEG for video, and MP3 for audio.
What is Shannon Fano Coding?
Shannon Fano Coding is a lossless data compression technique/algorithm that Claude Shannon and Robert Fano developed in the late 1940s. The main of data compression is to reduce the data size without losing information so that we can store and transfer it more efficiently.
In Shannon Fano coding, the idea is to assign the binary codes to symbols according to their frequency of occurrence. For example, if the symbol ‘C’ occurs more than the symbol ‘B’, then we assign the symbol ‘B’ the shorter binary code than ‘C’.
Working of Shannon Fano Coding Algorithm
The algorithm's steps are as follows:
Calculate the number of times each symbol appears and then find out the probability of each symbol by dividing it by the total number of symbols.
Now, Sort the symbols in decreasing order of their probability.
Divide the symbols into two subparts, with the sum of probabilities in each part being as close to each other as possible.
Assign the value '0' to the first subpart and '1' to the second subpart.
Repeat steps 3 and 4 for each subpart until each symbol is it own.
Ensure the probabilities are accurate; otherwise, the resulting binary code may not be optimal for compression.
Example to understand Shannon Fano coding
In this section, we will implement the Shannon Fano coding algorithm.
Let ABCDE a Symbol with the Count of each as A=12, B=4, C=3, D=3, E=2.
Steps 1 and 2 are being implemented
Symbol
Count
Probability
A
12
0.48
B
4
0.16
C
3
0.12
D
3
0.12
E
2
0.08
Step 3 is implemented in the below diagram
Steps 4 and 5 are implemented in the below diagrams
After applying Step 4, the Code we get for each are:
Symbol
Count
Probability
Code
A
12
0.48
0
B
4
0.16
100
C
3
012
101
D
3
0.12
110
E
2
0.08
111
Code Implementation
Implementation of the above approach:
C++
C++
#include <bits/stdc++.h> using namespace std;
struct node { // storing symbol string sym; // storing probability or frequency float pro; int arr[20]; int top; } p[20];
typedef struct node;
// function to find shannon code void shannon(int l, int h, node p[]) { float pack1 = 0, pack2 = 0, diff1 = 0, diff2 = 0; int i, d, k, j; if ((l + 1) == h || l == h || l > h) { if (l == h || l > h) return; p[h].arr[++(p[h].top)] = 0; p[l].arr[++(p[l].top)] = 1; return; } else { for (i = l; i <= h - 1; i++) pack1 = pack1 + p[i].pro; pack2 = pack2 + p[h].pro; diff1 = pack1 - pack2; if (diff1 < 0) diff1 = diff1 * -1; j = 2; while (j != h - l + 1) { k = h - j; pack1 = pack2 = 0; for (i = l; i <= k; i++) pack1 = pack1 + p[i].pro; for (i = h; i > k; i--) pack2 = pack2 + p[i].pro; diff2 = pack1 - pack2; if (diff2 < 0) diff2 = diff2 * -1; if (diff2 >= diff1) break; diff1 = diff2; j++; } k++; for (i = l; i <= k; i++) p[i].arr[++(p[i].top)] = 1; for (i = k + 1; i <= h; i++) p[i].arr[++(p[i].top)] = 0;
// Function to sort the symbols // based on their probability or frequency void sortByProbability(int n, node p[]) { int i, j; node temp; for (j = 1; j <= n - 1; j++) { for (i = 0; i < n - 1; i++) { if ((p[i].pro) > (p[i + 1].pro)) { temp.pro = p[i].pro; temp.sym = p[i].sym;
Differences between Huffman and Shannon-Fano Algorithm
Feature
Huffman Algorithm
Shannon-Fano Algorithm
Basis of Construction
Constructed based on frequency of symbols using a binary tree with minimum weighted path lengths.
Constructed based on dividing symbols into two sets with approximately equal probabilities.
Efficiency
Generally more efficient, providing optimal prefix codes.
Slightly less efficient, may not always produce the most optimal prefix codes.
Complexity
Higher complexity due to the need for sorting and tree construction.
Lower complexity, involves simpler steps of dividing and assigning codes.
Code Lengths
Produces variable-length codes that are usually shorter.
Produces variable-length codes, but may be longer compared to Huffman codes.
Implementation
More complex to implement due to the necessity of constructing and traversing the tree.
Easier to implement with simpler and more straightforward steps.
Optimality
Always produces the most optimal prefix-free code (minimum redundancy).
Does not always guarantee optimal codes and may have slight redundancy.
Use Cases
Widely used in file compression algorithms like ZIP and JPEG.
Less commonly used, often replaced by Huffman coding in practical applications.
Historical Background
Developed by David A. Huffman in 1952.
Developed by Claude Shannon and Robert Fano in 1949.
Frequently Asked Questions
What is the Worst Time Complexity of Shanon Fano Algorithm?
The partitions carried out each time may be extremely unbalanced; for instance, if the probabilities are 0.06, 0.1, 0.11, 0.2, and 0.4, the recurrence relation is:
T(n)=T(n-1)+T(1)+Θ(n)
T(n) in this case turns out to be O(n2).
What is the Best Time Complexity of Shanon Fano Algorithm?
If the partitions can be divided in such a way that their sizes are nearly equal, for instance, if the probabilities are 0.3,0.55,0.55,0.4, the recurrence relation is:
T(n)=2T(n/2)+Θ(n)
T(n) in this case turns out to be O(n*logn).
How to Construct a Shannon Code?
To construct a Shannon code, divide symbols into two sets with equal probabilities, then assign binary digits recursively to each set based on their cumulative probabilities.
Conclusion
Congratulations on finishing this blog!! This article has discussed the types of compression and looked at the working of Shannon fano coding with its implementation with the help of diagrams and later with the code. At last, we discussed some FAQs.
Hope you enjoyed reading this article, here are some more related articles: