


1) All codes are binary strings.
2) Each code should be able to determine its corresponding character uniquely.
3) The total numbers of bits used to represent the message are minimized.
If there are multiple sets of valid Huffman codes for a message. You can print any of them.
Consider the array ARR = [ 1, 4, 2 ] having 3 elements.
The array containing Huffman Codes for the above array will be [ '10', '0', '11' ]. Other Valid Huffman Codes are [ '01', '1', '00' ], [ '00', '1', '01' ] etc. Codes like [ '1', '0', '01' ], [ '1', '10' , '0' ] are some of the invalid Huffman Codes.
The first line of the input contains an integer, 'T,’ denoting the number of test cases.
The first line of each test case contains an integer 'N', denoting the number of elements in the array 'ARR'.
The second line of each test case contains 'N' space-separated integers denoting the array elements.
The checker will print 1 if the returned Huffman codes are correct and follow all the rules, otherwise, it will print 0.
Print the output of each test case in a new line.
You do not need to print anything. It has already been taken care of. Just implement the given function.
1 <= T <= 10
1 <= N <= 10^4
1 <= ARR[i] <= 10^4
Where 'T' denotes the number of test cases, 'N' denotes the elements in the array 'ARR', and 'ARR[i]' denotes the 'i'th' element of the array 'ARR'.
Time Limit: 1 sec
We will be dividing our solution into three parts for better understanding. We will use a Min-Heap to build a binary tree and then traverse the binary tree to assign Huffman codes to each element.
A basic idea would be to sort the frequency array and repeatedly give the smallest not used code to the character having maximum frequency. This idea looks decent, but it does not always provide the right answer.
Consider the sorted array ARR = [ 3, 4, 4, 5 ]. Using the above idea, we can give the codes [ '111', '110', '10', '0' ]. The number of bits used will be 34 ( 3*3 + 4*3 + 4*2 + 5*1 ). But if we give the codes [ '11', '10', '00', '01' ]. The number of bits will be 32 ( 2*3 + 2*4 +2*4 +2*5) which is lesser than the result obtained by previous approach.
From here, we can conclude that the sorting algorithm fails. But giving the characters having the most frequency the smallest codes makes sense. Where are we wrong?
We need to see that there can be combinations in which the character having the largest frequency is not given the smallest possible code (i.e. '0'). There are other ways possible too that are providing better answers. Though the character having the largest frequency will always have the smallest code among codes of other characters in the message, still it will not always have the smallest code possible (i.e. '0').
Therefore we need to greedily select two nodes having minimum frequency, assign them the largest codes and use the sum of the two nodes as a new node for further process. This can be done efficiently with the help of a Min Heap.
The node in the Heap will have an index denoting the index of its corresponding character in the message, frequency denoting the frequency of the node in the message, and the two nodes left and right denoting its left and right child, respectively. The basic idea is to take the two elements having the smallest frequency, make a new node having frequency as the sum of the frequencies of the removed elements, and add it back to the Heap. We will make the removed elements as the left child and the right child of the newly built node. In the end, we will stop when only one element remains.
We can write a recursive function that will traverse the Huffman Tree and assign the Huffman codes for every character. Let ans be the array that stores the Huffman codes for each character.