Last Updated: 13 Feb, 2021

Message Encoding

Moderate

Asked in companies

Solve now

Problem statement

You are given an array 'ARR' of Integers having 'N' elements. The array contains an encoded message. For each index 'i', 'ARR[i]' denotes the frequency of the 'i'th' character in the message. The characters are of an alien language having 'N' alphabets. Given the frequency of each of the 'N' alphabets in the message, your task is to find out the Huffman codes for each of the 'N' alphabets in the message.

The Huffman Code for a message is the set of codes such that :

1) All codes are binary strings.
2) Each code should be able to determine its corresponding character uniquely.
3) The total numbers of bits used to represent the message are minimized.

Note:

If there are multiple sets of valid Huffman codes for a message. You can print any of them.

For example:

Consider the array ARR = [ 1, 4, 2 ] having 3 elements. 
The array containing Huffman Codes for the above array will be [ '10', '0', '11' ]. Other Valid Huffman Codes are [ '01', '1', '00' ], [ '00', '1', '01' ] etc. Codes like [ '1', '0', '01' ], [ '1', '10' , '0' ] are some of the invalid Huffman Codes.

Input Format:

The first line of the input contains an integer, 'T,’ denoting the number of test cases.

The first line of each test case contains an integer 'N', denoting the number of elements in the array 'ARR'.

The second line of each test case contains 'N' space-separated integers denoting the array elements.

Output Format:

The checker will print 1 if the returned Huffman codes are correct and follow all the rules, otherwise, it will print 0.

Print the output of each test case in a new line.

Note :

You do not need to print anything. It has already been taken care of. Just implement the given function.

Constraints:

1 <= T <= 10
1 <= N <= 10^4
1 <= ARR[i]  <= 10^4

Where 'T' denotes the number of test cases, 'N' denotes the elements in the array 'ARR', and 'ARR[i]' denotes the 'i'th' element of the array 'ARR'.

Time Limit: 1 sec

Solve now

Approaches

01 Approach

We will be dividing our solution into three parts for better understanding. We will use a Min-Heap to build a binary tree and then traverse the binary tree to assign Huffman codes to each element.

Understanding the need of using a Min-Heap

A basic idea would be to sort the frequency array and repeatedly give the smallest not used code to the character having maximum frequency. This idea looks decent, but it does not always provide the right answer.

Consider the sorted array ARR = [ 3, 4, 4, 5 ]. Using the above idea, we can give the codes [ '111', '110', '10', '0' ]. The number of bits used will be 34 ( 3*3 + 4*3 + 4*2 + 5*1 ). But if we give the codes [ '11', '10', '00', '01' ]. The number of bits will be 32 ( 2*3 + 2*4 +2*4 +2*5) which is lesser than the result obtained by previous approach.

From here, we can conclude that the sorting algorithm fails. But giving the characters having the most frequency the smallest codes makes sense. Where are we wrong?

We need to see that there can be combinations in which the character having the largest frequency is not given the smallest possible code (i.e. '0'). There are other ways possible too that are providing better answers. Though the character having the largest frequency will always have the smallest code among codes of other characters in the message, still it will not always have the smallest code possible (i.e. '0').

Therefore we need to greedily select two nodes having minimum frequency, assign them the largest codes and use the sum of the two nodes as a new node for further process. This can be done efficiently with the help of a Min Heap.

2. Structure of the heap node and how to build Huffman Tree

The node in the Heap will have an index denoting the index of its corresponding character in the message, frequency denoting the frequency of the node in the message, and the two nodes left and right denoting its left and right child, respectively. The basic idea is to take the two elements having the smallest frequency, make a new node having frequency as the sum of the frequencies of the removed elements, and add it back to the Heap. We will make the removed elements as the left child and the right child of the newly built node. In the end, we will stop when only one element remains.

Steps :

Make an empty Heap.
Iterate through i = 0 to N - 1
- Create a new node having frequency = ARR [i], index = i, and mark its left child and right child as NULL.
- Insert the node into the heap.
While the number of elements in the heap are more than 1
- Pop the node having minimum frequency from the heap. Let the node be node1.
- Pop another node having minimum frequency from the heap. Let the node be node2.
- Create a new node having frequency = node1.frequency + node1.frequency, index = -1 ( as it does not point to any single index ) . Make node1 as its left child and node2 as its right child or vice-versa.
- Insert the newly created node into the heap.

3. Finding the Huffman codes from the Huffman Tree

We can write a recursive function that will traverse the Huffman Tree and assign the Huffman codes for every character. Let ans be the array that stores the Huffman codes for each character.

Working of the Recursive function

The recursive function takes the root of the tree root, the built string str, as its parameters.
If the root is a leaf node, then set ans [ root.index ] as str and end the function.
Otherwise, call the recursive function for the left subtree and the right subtree. To call the left subtree use (root.left, str + "0") as the arguments and to call for the right subtree use (root.right,str + "1") as the arguments. We can also use str + "1" for the left subtree and str + "0" for the right subtree as this will also produce a valid set of Huffman codes.

Solve now