Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Huffman coding rules and optimization

All sources i’ve read quote the below process to get Huffman codes:

  • Arrange the elements in ascending order of their frequencies.
  • Create a binary tree by repeatedly combining the two least frequent elements into a single node, with the sum of their frequencies as the frequency of the new node.
  • Assign ‘0’ to the left branch and ‘1’ to the right branch at each step.
  • The Huffman code for each element is obtained by traversing from the root of the tree to the respective leaf node, recording the path taken (0 for left, 1 for right).

Let’s try this for the following Frequency Map:

A -> 1 B -> 2 C -> 3 D -> 4

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Making the Tree

        (ABCD, 10)
       /          \
   (ABC, 6)         D:4
   /     \
 (AB, 3)  C:3
 /    \
A:1   B:2

Following this tree to get the codes:

A -> 000 B -> 001 C -> 01 D -> 1

Obviously, the Max depth and the max code length does not follow the logical rule of being log2(nElement), which in this case should be two.

Optimally the codes would be:

A -> 00 B -> 01 C -> 10 D -> 11

So what’s goin on here ? what am i doing wrong ?

>Solution :

Imagine encoding String "ABBCCCDDDD" – sample string where each symbol occurs with the given frequency.

Huffman encoding you came up with encodes it as 0000010010101011111 – 19 characters.

Your encoding where you optimized for max code length gives 00010110101011111111 – 20 characters.

Your mistake is thinking that max depth is what’s being optimized (or what needs to be optimized) – but Huffman encoding optimizes for encoded text length.

And as a side note, as I mentioned in a comment, Huffman produces "an" optimal solution, not "the" optimal solution, it’s possible I think for a lower-depth solution to be just as good, but it won’t be better.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading