You are given an array 'ARR' of Integers having 'N' elements. The array contains an encoded message. For each index 'i', 'ARR[i]' denotes the frequency of the 'i'th' character in the message. The characters are of an alien language having 'N' alphabets. Given the frequency of each of the 'N' alphabets in the message, your task is to find out the Huffman codes for each of the 'N' alphabets in the message.
The Huffman Code for a message is the set of codes such that :
1) All codes are binary strings.
2) Each code should be able to determine its corresponding character uniquely.
3) The total numbers of bits used to represent the message are minimized.
Note:
If there are multiple sets of valid Huffman codes for a message. You can print any of them.
For example:
Consider the array ARR = [ 1, 4, 2 ] having 3 elements.
The array containing Huffman Codes for the above array will be [ '10', '0', '11' ]. Other Valid Huffman Codes are [ '01', '1', '00' ], [ '00', '1', '01' ] etc. Codes like [ '1', '0', '01' ], [ '1', '10' , '0' ] are some of the invalid Huffman Codes.
The first line of the input contains an integer, 'T,’ denoting the number of test cases.
The first line of each test case contains an integer 'N', denoting the number of elements in the array 'ARR'.
The second line of each test case contains 'N' space-separated integers denoting the array elements.
Output Format:
The checker will print 1 if the returned Huffman codes are correct and follow all the rules, otherwise, it will print 0.
Print the output of each test case in a new line.
Note :
You do not need to print anything. It has already been taken care of. Just implement the given function.
1 <= T <= 10
1 <= N <= 10^4
1 <= ARR[i] <= 10^4
Where 'T' denotes the number of test cases, 'N' denotes the elements in the array 'ARR', and 'ARR[i]' denotes the 'i'th' element of the array 'ARR'.
Time Limit: 1 sec
2
3
5 7 6
2
4 3
1
1
For the first test case :
The array representing the Huffman codes will be [ '11', '0', '10' ] . Note that there are multiple other possible answers like [ '00', '1' ,'01' ], [ '01', '1', '00' ] etc. All of them are valid, so we can return any of them.
For the second test case :
The array representing the Huffman codes will be [ '0', '1' ]. The array [ '1', '0' ] also represents a valid set of Huffman Code.
2
3
1 2 5
4
5 6 3 1
1
1
For the first test case :
The array representing the Huffman codes will be [ '11', '10', '0' ] .
For the second test case :
The array representing the Huffman codes will be [ '10', '0', '110', '111' ].
Try to think of using a data structure that can efficiently extract the minimum value in an array so that we can build the Huffman Tree.
We will be dividing our solution into three parts for better understanding. We will use a Min-Heap to build a binary tree and then traverse the binary tree to assign Huffman codes to each element.
A basic idea would be to sort the frequency array and repeatedly give the smallest not used code to the character having maximum frequency. This idea looks decent, but it does not always provide the right answer.
Consider the sorted array ARR = [ 3, 4, 4, 5 ]. Using the above idea, we can give the codes [ '111', '110', '10', '0' ]. The number of bits used will be 34 ( 3*3 + 4*3 + 4*2 + 5*1 ). But if we give the codes [ '11', '10', '00', '01' ]. The number of bits will be 32 ( 2*3 + 2*4 +2*4 +2*5) which is lesser than the result obtained by previous approach.
From here, we can conclude that the sorting algorithm fails. But giving the characters having the most frequency the smallest codes makes sense. Where are we wrong?
We need to see that there can be combinations in which the character having the largest frequency is not given the smallest possible code (i.e. '0'). There are other ways possible too that are providing better answers. Though the character having the largest frequency will always have the smallest code among codes of other characters in the message, still it will not always have the smallest code possible (i.e. '0').
Therefore we need to greedily select two nodes having minimum frequency, assign them the largest codes and use the sum of the two nodes as a new node for further process. This can be done efficiently with the help of a Min Heap.
2. Structure of the heap node and how to build Huffman Tree
The node in the Heap will have an index denoting the index of its corresponding character in the message, frequency denoting the frequency of the node in the message, and the two nodes left and right denoting its left and right child, respectively. The basic idea is to take the two elements having the smallest frequency, make a new node having frequency as the sum of the frequencies of the removed elements, and add it back to the Heap. We will make the removed elements as the left child and the right child of the newly built node. In the end, we will stop when only one element remains.
Steps :
3. Finding the Huffman codes from the Huffman Tree
We can write a recursive function that will traverse the Huffman Tree and assign the Huffman codes for every character. Let ans be the array that stores the Huffman codes for each character.
Working of the Recursive function
When all elements of the input array are inserted in the heap, the number of elements in the heap are N, and it takes O(log(N)) time to remove or insert an element from the heap. Hence, the overall Time Complexity is O(N*log(N)).
When all elements of the input array are inserted in the heap, the number of elements in the heap are N. Hence, the overall Space Complexity is O(N).