Message Encoding

Consider the array ARR = [ 1, 4, 2 ] having 3 elements. The array containing Huffman Codes for the above array will be [ '10', '0', '11' ]. Other Valid Huffman Codes are [ '01', '1', '00' ], [ '00', '1', '01' ] etc. Codes like [ '1', '0', '01' ], [ '1', '10' , '0' ] are some of the invalid Huffman Codes.

The first line of the input contains an integer, 'T,’ denoting the number of test cases. The first line of each test case contains an integer 'N', denoting the number of elements in the array 'ARR'. The second line of each test case contains 'N' space-separated integers denoting the array elements.

1 <= T <= 10 1 <= N <= 10^4 1 <= ARR[i] <= 10^4 Where 'T' denotes the number of test cases, 'N' denotes the elements in the array 'ARR', and 'ARR[i]' denotes the 'i'th' element of the array 'ARR'. Time Limit: 1 sec

For the first test case : The array representing the Huffman codes will be [ '11', '0', '10' ] . Note that there are multiple other possible answers like [ '00', '1' ,'01' ], [ '01', '1', '00' ] etc. All of them are valid, so we can return any of them. For the second test case : The array representing the Huffman codes will be [ '0', '1' ]. The array [ '1', '0' ] also represents a valid set of Huffman Code.

For the first test case : The array representing the Huffman codes will be [ '11', '10', '0' ] . For the second test case : The array representing the Huffman codes will be [ '10', '0', '110', '111' ].

Optimized Solution

We will be dividing our solution into three parts for better understanding. We will use a Min-Heap to build a binary tree and then traverse the binary tree to assign Huffman codes to each element.

Understanding the need of using a Min-Heap

A basic idea would be to sort the frequency array and repeatedly give the smallest not used code to the character having maximum frequency. This idea looks decent, but it does not always provide the right answer.

Consider the sorted array ARR = [ 3, 4, 4, 5 ]. Using the above idea, we can give the codes [ '111', '110', '10', '0' ]. The number of bits used will be 34 ( 3*3 + 4*3 + 4*2 + 5*1 ). But if we give the codes [ '11', '10', '00', '01' ]. The number of bits will be 32 ( 2*3 + 2*4 +2*4 +2*5) which is lesser than the result obtained by previous approach.

From here, we can conclude that the sorting algorithm fails. But giving the characters having the most frequency the smallest codes makes sense. Where are we wrong?

We need to see that there can be combinations in which the character having the largest frequency is not given the smallest possible code (i.e. '0'). There are other ways possible too that are providing better answers. Though the character having the largest frequency will always have the smallest code among codes of other characters in the message, still it will not always have the smallest code possible (i.e. '0').

Therefore we need to greedily select two nodes having minimum frequency, assign them the largest codes and use the sum of the two nodes as a new node for further process. This can be done efficiently with the help of a Min Heap.

2. Structure of the heap node and how to build Huffman Tree

The node in the Heap will have an index denoting the index of its corresponding character in the message, frequency denoting the frequency of the node in the message, and the two nodes left and right denoting its left and right child, respectively. The basic idea is to take the two elements having the smallest frequency, make a new node having frequency as the sum of the frequencies of the removed elements, and add it back to the Heap. We will make the removed elements as the left child and the right child of the newly built node. In the end, we will stop when only one element remains.

Steps :

Make an empty Heap.
Iterate through i = 0 to N - 1
- Create a new node having frequency = ARR [i], index = i, and mark its left child and right child as NULL.
- Insert the node into the heap.
While the number of elements in the heap are more than 1
- Pop the node having minimum frequency from the heap. Let the node be node1.
- Pop another node having minimum frequency from the heap. Let the node be node2.
- Create a new node having frequency = node1.frequency + node1.frequency, index = -1 ( as it does not point to any single index ) . Make node1 as its left child and node2 as its right child or vice-versa.
- Insert the newly created node into the heap.

3. Finding the Huffman codes from the Huffman Tree

We can write a recursive function that will traverse the Huffman Tree and assign the Huffman codes for every character. Let ans be the array that stores the Huffman codes for each character.

Working of the Recursive function

The recursive function takes the root of the tree root, the built string str, as its parameters.
If the root is a leaf node, then set ans [ root.index ] as str and end the function.
Otherwise, call the recursive function for the left subtree and the right subtree. To call the left subtree use (root.left, str + "0") as the arguments and to call for the right subtree use (root.right,str + "1") as the arguments. We can also use str + "1" for the left subtree and str + "0" for the right subtree as this will also produce a valid set of Huffman codes.

Time Complexity

O(N*log(N)), where N is the number of elements in the array.

When all elements of the input array are inserted in the heap, the number of elements in the heap are N, and it takes O(log(N)) time to remove or insert an element from the heap. Hence, the overall Time Complexity is O(N*log(N)).

Space Complexity

O(N), where N is the number of elements in the array.

When all elements of the input array are inserted in the heap, the number of elements in the heap are N. Hence, the overall Space Complexity is O(N).

Problem statement

Sample Input 1:

Sample Output 1:

Explanation for Sample Input 1:

Sample Input 2:

Sample Output 2:

Explanation for Sample Input 2:

O(N*log(N)), where N is the number of elements in the array.

O(N), where N is the number of elements in the array.