Last Updated: 9 Sep, 2025

Huffman Coding with Unequal Letter Costs

Ninja
Asked in company
PhonePe

Problem statement

You are given a set of 'n' symbols, each with a given probability of occurrence. Your task is to devise a prefix-free binary code for these symbols using an alphabet {‘A’, ‘B’}.

However, the transmission time for the letters in the alphabet is unequal:


  • Transmitting 'A' takes 1 unit of time.

  • Transmitting 'B' takes 2 units of time.

  • Your goal is to find an optimal set of codes that minimizes the total expected transmission time. The expected time is calculated as the sum over all symbols of (symbolprobability * timeofitscode).

    A prefix-free code is one where no symbol's code is a prefix of another symbol's code.


    Input Format:
    The first line of input contains an integer 'n', the number of symbols.
    
    The second line contains 'n' space-separated floating-point numbers, representing the probabilities of the symbols. The sum of these probabilities will be 1.
    


    Output Format:
    Print a single floating-point number representing the minimum possible expected transmission time, rounded to a precision of 5 decimal places.
    


    Note:
    The standard Huffman coding algorithm greedily combines the two nodes with the lowest probabilities to form a new node. This works for uniform letter costs (e.g., '0' and '1' both costing 1 unit).
    
    For unequal letter costs, the greedy Huffman approach fails. The optimal substructure is more complex. A simple pairing might not be optimal because combining low-probability nodes might force the use of the expensive letter 'B' too often.
    
    The Larmore-Hirschberg algorithm provides a more sophisticated solution, but for the given constraints, a dynamic programming approach is more suitable.