I used a bit of freedom: * It reads from STDIN instead of ARGV. * Output is binary instead of 0-and-1-as-text. * Both encoding and decoding, for the extra points... * In order to avoid juggling with padding characters, the least frequent character is 1110 instead of 111 (in the example in the quiz). Every "character" in the encoded bit stream ends with a 0. Yes, it costs one bit, but since it's not used frequently, it doesn't really hurt. * 0 and 1 become 1 and 0, respectively, because we get the padding zeroes for free when we do "001".pack("B*"). So, rewriting the previous section: In order to avoid juggling with padding characters, the least frequent character is 0001 instead of 000. Every "character" in the encoded bit stream ends with a 1. Here's the code: http://tinyurl.com/3x5nsy Example: $ echo -n Scooby-Dooby-Doo | ruby huffman.rb | ruby huffman.rb -d Compression: 14/16 (87%) Decompression: 14/16 (87%) Scooby-Dooby-Doo How does it work? First of all, we count each unique byte in the original byte stream (e.g. "Scooby-Dooby-Doo"): {"-"=>2, "b"=>2, "y"=>2, "c"=>1, "D"=>2, "o"=>6, "S"=>1} ... which is sorted: [["o", 6], ["-", 2], ["D", 2], ["b", 2], ["y", 2], ["S", 1], ["c", 1]] ... and cleaned: ["o", "-", "D", "b", "y", "S", "c"] The leftmost character is used most often and the rightmost character is used least often. We call this ordered list the "alphabet". Secondly, we build the translation table, where each character in the "alphabet" is mapped to a bit sequence: {"b"=>"0001", "-"=>"01", "c"=>"0000001", "y"=>"00001", "D"=>"001", "o"=>"1", "S"=>"000001"} We can translated each byte of the original byte stream: ["S", "c", "o", "o", "b", "y", "-", "D", "o", "o", "b", "y", "-", "D", "o", "o"] ... to the corresponding bit stream: ["000001", "0000001", "1", "1", "0001", "00001", "01", "001", "1", "1", "0001", "00001", "01", "001", "1", "1"] ... which is joined into a string: "00000100000011100010000101001110001000010100111" ... which is packed (with pack("B*")) into another string: "\004\016!N!N" ... which we'll call the "encoded bit stream". In order to be able to decode the message, we have to serialize the "alphabet" and concatenate it with the "encoded bit stream": alphabet + bit stream -> message "o-DbySc" + "\004\016!N!N" -> "o-DbySc\004\016!N!N" But where's the boundary between "alphabet" and "encoded bit stream"? Well, let's add the length of the "alphabet" ([7]..pack("C")): len + alphabet + bit stream -> message "\a" + "o-DbySc" + "\004\016!N!N" -> "\ao-DbySc\004\016!N!N" Conclusion? "Scooby-Dooby-Doo" becomes "\ao-DbySc\004\016!N!N". Well, we've saved 2 bytes, but it doesn't sound that well... gegroet, Erik V. - http://www.erikveen.dds.nl/