FACTOID # 80: America puts many more of its citizens in prison than any other nation.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Canonical Huffman code

Introduction

A canonical Huffman code is a particular type of Huffman code which has the property that it can be very compactly described. In computer science, Huffman coding is an entropy encoding algorithm used for data compression that finds the optimal system of encoding strings based on the relative frequency of each character. ...


Data compressors generally work in one of two ways. Either the decompressor can infer what codebook the compressor has used from previous context, or the compressor must tell the decompressor what the codebook is. Since a canonical Huffman codebook can be stored especially efficiently, most compressors start by generating a "normal" Huffman codebook, and the convert it to canonical Huffman before using it.


Algorithm

The normal Huffman coding algorithm assigns a particular codeword to every symbol in the alphabet according to the weights assigned to each such symbol. To make the code a canonical Huffman code we need only renumber the codewords. For example, suppose we have the following non-canonical codebook:

 A = 11 B = 0 C = 101 D = 100 

Here the letter A has been assigned 2 bits, B has 1 bit, and C and D both have 3 bits. We now sort the list first by codeword length, and then by letter:

 C = 101 D = 100 A = 11 B = 0 

We now replace each of the codewords with a new one of the same length using the following algorithm:

  • The first symbol in the list gets assigned the codeword 000...000 (same length as the original code).
  • Each subsequent symbol is assigned the next binary number in sequence.
  • When you reach a shorter codeword, snip digits from the right of the codeword before incrimenting to generate the next code.

By these rules, the canonical version of the above codebook is

 C = 000 D = 001 A = 01 B = 1 

Encoding the Codebook

The whole advantage of canonical Huffman is that one can write the codebook in fewer bits.


Let us take our origina Huffman codebook again:

 A = 11 B = 0 C = 101 D = 100 

There are several ways we could encode this and write it into a compressed data file. For example, we could write each symbol followed by the codeword:

 "A",2,11,"B",1,0,"C",3,101,"D",3,100 

Since we are listing the symbols in order, we can probably omit them, which gives

 2,11,1,0,3,101,3,100 

With our canonical version,

 C = 000 D = 001 A = 01 B = 1 

we need only say how many bits each symbol has. The canonical Huffman algorithm allows us to recreate the entire table using only that. So we may now write

 2,1,3,3 

And that is all.



 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.