Huffman Tutorial Part 5
Tags: c++ Huffman decoder decompressionIf you have arrived at this point, it is assumed that you have slaved through
the parts previous to this one in the series. You may have spent a lot of time -
ranging from a week to half a month - on this (rather large) endeavor. And if
you have made it through, CONGRATULATIONS
!
Good work. Now, here is a way to test that your encoder works: by decoding it!
Quick Run-Down of How it Works
Should be rather straightforward:
- Read the binary
- Take up the header information
- Using the header information, (re)construct the Huffman tree
- Using the Huffman tree, construct a map mapping the binary digits to a character
- Using said map, iterate through the leftover binary data, and write the characters to stream (whether it be output or file)
- Close the streams
Because of DRY principles, you
should reuse the code you wrote in your Huffman compression program (in
C/C++
’s case, use #include
or something - try to avoid copying and
pasting). This goes for (most likely) steps #3-4.
Testing
Should also be quite self-explanatory: you want to see if your Huffman compressor/decompresser works, you run a plain-text file through the compressor, producing a binary file. You run said binary file through the decompresser, producing a plain-text file. You then proceed to check if the plain-text file you produced is the same with the original file.
valgrind
is your friend.
Assignment for This Part
-
create Huffman decompresser with code reuse
-
finish Huffman compressor/decompresser
( denotes a challenging task.
denotes an even more challenging
task.)
Final Words
Wasn’t that fun? This project teaches you about the applications of binary trees, the recursion that comes with it, and a little bit of bit manipulation.
For some reason, I like it. In fact, I like doing things that involve (low-level) bit manipulations (like Huffman and Chip-8 emulators). Ha, the only thing I seem to dislike about it is when I found out about endianness - the differences between big and little endian. Some systems (like mine) used little endian, meaning that some data-types (such as words) are stored with the least significant byte in the smallest address:
Original Data: 0xABCDEF
LITTLE ENDIAN
-------------
Address: 0000 | 0001 | 0002
Data: EF | CD | AB
BIG ENDIAN
----------
Address: 0000 | 0001 | 0002
Data: AB | CD | EF
Hmmm… this sounds interesting. It is something that I know about, that I have a little experience in. Looks like an idea for the next tutorial/walkthrough….
Anyways, I hope you had as much fun making this project as I did! Hopefully, your teacher would award you properly, or, if not applicable, you would reward yourself with a nice healthy burst of dopamine. ~Niiiiiicce.