Table of contents
1.
Introduction
2.
Types of Data Compression
3.
What is Shannon-Fano Coding?
4.
Working on the Shannon-Fano Coding Algorithm
5.
Example to understand Shannon-Fano coding
6.
Implementation
6.1.
C
6.2.
C++
6.3.
Java
6.4.
Python
6.5.
JS
6.6.
C#
7.
Differences between Huffman and Shannon-Fano Algorithm
8.
Frequently Asked Questions
8.1.
What is the Worst Time Complexity of the Shannon-Fano Algorithm?
8.2.
What is the Best Time Complexity of the Shannon-Fano Algorithm?
8.3.
How to Construct a Shannon Code?
9.
Conclusion
Last Updated: Mar 7, 2025
Medium

Shannon-Fano Coding

Author Pallavi singh
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Shannon-Fano Coding is a lossless data compression algorithm used to efficiently encode data by assigning variable-length binary codes to symbols based on their probability of occurrence. Developed by Claude Shannon and Robert Fano, this technique is widely used in information theory and data compression. The algorithm helps minimize the overall length of encoded messages, making data storage and transmission more efficient. In this blog, we will explore the Shannon-Fano Coding algorithm.

shannon fano coding

Types of Data Compression

There are two types of data compression techniques: Lossless and Lossy. Compressing the size of files is done using these techniques. Let's have a look at each one of the techniques used in data compression.

  • Lossless Compression: In this compression technique, the original data can be restored entirely without losing any data or information. Some standard lossless compression methods are Huffman coding, Arithmetic coding, and Lempel-Ziv-Welch (LZW) compression.
lossless block diagram

 

  • Lossy Compression: In this compression technique, some of the data is lost during the compression, but its advantage is that it results in a smaller file size. Lossy compression is majorly used for media files such as images, audio and video. Some standard lossy compression techniques are JPEG for photos, MPEG for video, and MP3 for audio.
lossy block diagram

What is Shannon-Fano Coding?

Shannon Fano Coding is a lossless data compression technique/algorithm that Claude Shannon and Robert Fano developed in the late 1940s. The main of data compression is to reduce the data size without losing information so that we can store and transfer it more efficiently.

In Shannon Fano coding, the idea is to assign the binary codes to symbols according to their frequency of occurrence. For example, if the symbol ‘C’ occurs more than the symbol ‘B’, then we assign the symbol ‘B’ the shorter binary code than ‘C’.

Working on the Shannon-Fano Coding Algorithm

The algorithm's steps are as follows:

  1. Calculate the number of times each symbol appears and then find out the probability of each symbol by dividing it by the total number of symbols.
     
  2. Now, Sort the symbols in decreasing order of their probability.
     
  3. Divide the symbols into two subparts, with the sum of probabilities in each part being as close to each other as possible. 
     
  4. Assign the value  '0' to the first subpart and '1' to the second subpart.
     
  5. Repeat steps 3 and 4 for each subpart until each symbol is it own.
     

Ensure the probabilities are accurate; otherwise, the resulting binary code may not be optimal for compression.

Example to understand Shannon-Fano coding

In this section, we will implement the Shannon Fano coding algorithm.

Let ABCDE a Symbol with the Count of each as A=12, B=4, C=3, D=3, E=2.

Steps 1 and 2 are being implemented

SymbolCountProbability
A120.48
B40.16
C30.12
D30.12
E20.08


Step 3 is implemented in the below diagram

diagram


Steps 4 and 5 are implemented in the below diagrams

diagram
diagram

 


After applying Step 4, the Code we get for each are:

SymbolCountProbabilityCode
A120.480
B40.16100
C3012101
D30.12110
E20.08111

Implementation

Implementation of the above approach:

  • C
  • C++
  • Java
  • Python
  • JS
  • C#

C

#include <stdio.h>
#include <string.h>

struct Node {
char sym;
float prob;
int arr[20];
int top;
};

void shannon(int l, int h, struct Node p[]) {
if (l == h) return;

float total = 0, sum = 0;
for (int i = l; i <= h; i++) {
total += p[i].prob;
}

int mid = l;
while (sum + p[mid].prob <= total / 2 && mid < h) {
sum += p[mid].prob;
mid++;
}

for (int i = l; i <= mid; i++) {
p[i].arr[++(p[i].top)] = 1;
}
for (int i = mid + 1; i <= h; i++) {
p[i].arr[++(p[i].top)] = 0;
}

shannon(l, mid, p);
shannon(mid + 1, h, p);
}

void display(int n, struct Node p[]) {
printf("\nSymbol\tProbability\tCode\n");
for (int i = 0; i < n; i++) {
printf("%c\t%.2f\t\t", p[i].sym, p[i].prob);
for (int j = 0; j <= p[i].top; j++) {
printf("%d", p[i].arr[j]);
}
printf("\n");
}
}

int main() {
int n = 5;
struct Node p[5] = {
{'A', 0.48, {0}, -1},
{'B', 0.16, {0}, -1},
{'C', 0.12, {0}, -1},
{'D', 0.12, {0}, -1},
{'E', 0.08, {0}, -1}
};

shannon(0, n - 1, p);
display(n, p);
return 0;
}
You can also try this code with Online C Compiler
Run Code

C++

#include <bits/stdc++.h>
using namespace std;

struct node {
// storing symbol
string sym;
// storing probability or frequency
float pro;
int arr[20];
int top;
} p[20];


typedef struct node;

// function to find shannon code
void shannon(int l, int h, node p[])
{
float pack1 = 0, pack2 = 0, diff1 = 0, diff2 = 0;
int i, d, k, j;
if ((l + 1) == h || l == h || l > h) {
if (l == h || l > h)
return;
p[h].arr[++(p[h].top)] = 0;
p[l].arr[++(p[l].top)] = 1;
return;
}
else {
for (i = l; i <= h - 1; i++)
pack1 = pack1 + p[i].pro;
pack2 = pack2 + p[h].pro;
diff1 = pack1 - pack2;
if (diff1 < 0)
diff1 = diff1 * -1;
j = 2;
while (j != h - l + 1) {
k = h - j;
pack1 = pack2 = 0;
for (i = l; i <= k; i++)
pack1 = pack1 + p[i].pro;
for (i = h; i > k; i--)
pack2 = pack2 + p[i].pro;
diff2 = pack1 - pack2;
if (diff2 < 0)
diff2 = diff2 * -1;
if (diff2 >= diff1)
break;
diff1 = diff2;
j++;
}
k++;
for (i = l; i <= k; i++)
p[i].arr[++(p[i].top)] = 1;
for (i = k + 1; i <= h; i++)
p[i].arr[++(p[i].top)] = 0;

// Invoke shannon function
shannon(l, k, p);
shannon(k + 1, h, p);
}
}

// Function to sort the symbols
// based on their probability or frequency
void sortByProbability(int n, node p[])
{
int i, j;
node temp;
for (j = 1; j <= n - 1; j++) {
for (i = 0; i < n - 1; i++) {
if ((p[i].pro) > (p[i + 1].pro)) {
temp.pro = p[i].pro;
temp.sym = p[i].sym;


p[i].pro = p[i + 1].pro;
p[i].sym = p[i + 1].sym;


p[i + 1].pro = temp.pro;
p[i + 1].sym = temp.sym;
}
}
}
}

// function to display Shannon codes
void display(int n, node p[])
{
int i, j;
cout << "\n\n\n\tSymbol\tProbability\tCode";
for (i = n - 1; i >= 0; i--) {
cout << "\n\t" << p[i].sym << "\t\t" << p[i].pro << "\t";
for (j = 0; j <= p[i].top; j++)
cout << p[i].arr[j];
}
}

// Driver code
int main()
{
int n, i, j;
float total = 0;
string ch;
node temp;


// Input the number of symbols
cout << "Enter number of symbols\t: ";
n = 5;
cout << n << endl;


// Input symbols
for (i = 0; i < n; i++) {
cout << "Enter symbol " << i + 1 << " : ";
ch = (char)(65 + i);
cout << ch << endl;


// Insert the symbol to the node
p[i].sym += ch;
}

// Input probability of symbols
float x[] = { 0.48, 0.16, 0.12, 0.12, 0.08 };
for (i = 0; i < n; i++) {
cout << "\nEnter probability of " << p[i].sym << " : ";
cout << x[i] << endl;


// Insert the value to node
p[i].pro = x[i];
total = total + p[i].pro;


// checking max probability
if (total > 1) {
cout << "Invalid. Enter new values";
total = total - p[i].pro;
i--;
}
}

p[i].pro = 1 - total;

// Sorting the symbols based on
// their probability or frequency
sortByProbability(n, p);

for (i = 0; i < n; i++)
p[i].top = -1;

// Find the Shannon code
shannon(0, n - 1, p);

// Display the codes
display(n, p);
return 0;
}
You can also try this code with Online C++ Compiler
Run Code

Java

import java.util.*;

class Node {
String symbol;
double prob;
String code = "";

Node(String symbol, double prob) {
this.symbol = symbol;
this.prob = prob;
}
}

public class ShannonFano {
public static void shannon(List<Node> nodes, int l, int h) {
if (l == h) return;

double total = nodes.subList(l, h + 1).stream().mapToDouble(n -> n.prob).sum();
double sum = 0;
int mid = l;

while (sum + nodes.get(mid).prob <= total / 2 && mid < h) {
sum += nodes.get(mid).prob;
mid++;
}

for (int i = l; i <= mid; i++) nodes.get(i).code += "1";
for (int i = mid + 1; i <= h; i++) nodes.get(i).code += "0";

shannon(nodes, l, mid);
shannon(nodes, mid + 1, h);
}

public static void main(String[] args) {
List<Node> nodes = Arrays.asList(
new Node("A", 0.48),
new Node("B", 0.16),
new Node("C", 0.12),
new Node("D", 0.12),
new Node("E", 0.08)
);

shannon(nodes, 0, nodes.size() - 1);
System.out.println("Symbol\tProbability\tCode");
for (Node n : nodes) {
System.out.println(n.symbol + "\t" + n.prob + "\t\t" + n.code);
}
}
}
You can also try this code with Online Java Compiler
Run Code

Python

class Node:
def __init__(self, sym, prob):
self.sym = sym
self.prob = prob
self.code = ""

def shannon(nodes, l, h):
if l == h:
return

total = sum(node.prob for node in nodes[l:h+1])
half = 0
mid = l

while half + nodes[mid].prob <= total / 2 and mid < h:
half += nodes[mid].prob
mid += 1

for i in range(l, mid + 1):
nodes[i].code += "1"
for i in range(mid + 1, h + 1):
nodes[i].code += "0"

shannon(nodes, l, mid)
shannon(nodes, mid + 1, h)

nodes = [Node("A", 0.48), Node("B", 0.16), Node("C", 0.12), Node("D", 0.12), Node("E", 0.08)]
shannon(nodes, 0, len(nodes) - 1)

print("Symbol\tProbability\tCode")
for node in nodes:
print(f"{node.sym}\t{node.prob}\t\t{node.code}")
You can also try this code with Online Python Compiler
Run Code

JS

class Node {
constructor(sym, prob) {
this.sym = sym;
this.prob = prob;
this.code = "";
}
}

function shannon(nodes, l, h) {
if (l === h) return;

let total = nodes.slice(l, h + 1).reduce((acc, node) => acc + node.prob, 0);
let sum = 0;
let mid = l;

while (sum + nodes[mid].prob <= total / 2 && mid < h) {
sum += nodes[mid].prob;
mid++;
}

for (let i = l; i <= mid; i++) nodes[i].code += "1";
for (let i = mid + 1; i <= h; i++) nodes[i].code += "0";

shannon(nodes, l, mid);
shannon(nodes, mid + 1, h);
}

let nodes = [
new Node("A", 0.48),
new Node("B", 0.16),
new Node("C", 0.12),
new Node("D", 0.12),
new Node("E", 0.08),
];

shannon(nodes, 0, nodes.length - 1);

console.log("Symbol\tProbability\tCode");
nodes.forEach(node => console.log(`${node.sym}\t${node.prob}\t${node.code}`));
You can also try this code with Online Javascript Compiler
Run Code

C#

using System;
using System.Collections.Generic;

class Node {
public string Sym;
public double Prob;
public string Code = "";
}

class ShannonFano {
static void Shannon(List<Node> nodes, int l, int h) {
if (l == h) return;

double total = 0, sum = 0;
foreach (var node in nodes.GetRange(l, h - l + 1)) total += node.Prob;

int mid = l;
while (sum + nodes[mid].Prob <= total / 2 && mid < h) {
sum += nodes[mid].Prob;
mid++;
}

for (int i = l; i <= mid; i++) nodes[i].Code += "1";
for (int i = mid + 1; i <= h; i++) nodes[i].Code += "0";

Shannon(nodes, l, mid);
Shannon(nodes, mid + 1, h);
}

static void Main() {
var nodes = new List<Node> {
new Node { Sym = "A", Prob = 0.48 },
new Node { Sym = "B", Prob = 0.16 },
new Node { Sym = "C", Prob = 0.12 },
new Node { Sym = "D", Prob = 0.12 },
new Node { Sym = "E", Prob = 0.08 }
};

Shannon(nodes, 0, nodes.Count - 1);

Console.WriteLine("Symbol\tProbability\tCode");
foreach (var node in nodes) {
Console.WriteLine($"{node.Sym}\t{node.Prob}\t{node.Code}");
}
}
}

Output

Output of Shannon fano Coding

Differences between Huffman and Shannon-Fano Algorithm

FeatureHuffman AlgorithmShannon-Fano Algorithm
Basis of ConstructionConstructed based on frequency of symbols using a binary tree with minimum weighted path lengths.Constructed based on dividing symbols into two sets with approximately equal probabilities.
EfficiencyGenerally more efficient, providing optimal prefix codes.Slightly less efficient, may not always produce the most optimal prefix codes.
ComplexityHigher complexity due to the need for sorting and tree construction.Lower complexity, involves simpler steps of dividing and assigning codes.
Code LengthsProduces variable-length codes that are usually shorter.Produces variable-length codes, but may be longer compared to Huffman codes.
ImplementationMore complex to implement due to the necessity of constructing and traversing the tree.Easier to implement with simpler and more straightforward steps.
OptimalityAlways produces the most optimal prefix-free code (minimum redundancy).Does not always guarantee optimal codes and may have slight redundancy.
Use CasesWidely used in file compression algorithms like ZIP and JPEG.Less commonly used, often replaced by Huffman coding in practical applications.
Historical BackgroundDeveloped by David A. Huffman in 1952.Developed by Claude Shannon and Robert Fano in 1949.

Frequently Asked Questions

What is the Worst Time Complexity of the Shannon-Fano Algorithm?

The partitions carried out each time may be extremely unbalanced; for instance, if the probabilities are 0.06, 0.1, 0.11, 0.2, and 0.4, the recurrence relation is: 

T(n)=T(n-1)+T(1)+Θ(n)

T(n) in this case turns out to be O(n2).

What is the Best Time Complexity of the Shannon-Fano Algorithm?

If the partitions can be divided in such a way that their sizes are nearly equal, for instance, if the probabilities are 0.3,0.55,0.55,0.4, the recurrence relation is: 

T(n)=2T(n/2)+Θ(n)

T(n) in this case turns out to be O(n*logn).

How to Construct a Shannon Code?

To construct a Shannon code, divide symbols into two sets with equal probabilities, then assign binary digits recursively to each set based on their cumulative probabilities.

Conclusion

Congratulations on finishing this blog!! This article has discussed the types of compression and looked at the working of Shannon fano coding with its implementation with the help of diagrams and later with the code. At last, we discussed some FAQs.

Hope you enjoyed reading this article, here are some more related articles:

Live masterclass