The better solution... Here's the code: http://tinyurl.com/32tjfo Example: $ echo -n Scooby-Dooby-Doo | ruby huffman.rb | ruby huffman.rb -d Compression: 27/17 (158%) Decompression: 27/17 (158%) Scooby-Dooby-Doo How does it work? Count each unique byte in the original byte stream (e.g. "Scooby-Dooby-Doo"): [["S", 1], ["c", 1], ["o", 6], ["b", 2], ["y", 2], ["-", 2], ["D", 2]] This list is then converted to a tree, using this code: while tree.length > 1 a, b, *tree = tree.sort_by{|_, count| count} # Not deterministic. tree << [[a[0], b[0]], a[1]+b[1]] end In English: The two least frequently used leaves/nodes are replaced by one node (a combination of both leaves), recursively, until the list is only one leaf/node long. Step by step, this results in the following: [["S", 1], ["c", 1], ["b", 2], ["D", 2], ["y", 2], ["-", 2], ["o", 6]] [["b", 2], ["D", 2], ["y", 2], [["S", "c"], 2], ["-", 2], ["o", 6]] [["y", 2], [["S", "c"], 2], ["-", 2], [["b", "D"], 4], ["o", 6]] [["-", 2], [["b", "D"], 4], [["y", ["S", "c"]], 4], ["o", 6]] [[["y", ["S", "c"]], 4], ["o", 6], [["-", ["b", "D"]], 6]] [[["-", ["b", "D"]], 6], [[["y", ["S", "c"]], "o"], 10]] [[[["-", ["b", "D"]], [["y", ["S", "c"]], "o"]], 16]] The tree is actually still in the list (the only element in the list), so we get it and strip the weight: [["-", ["b", "D"]], [["y", ["S", "c"]], "o"]] Secondly, we build the translation table, where each leaf in the tree is mapped to a bit sequence, by using the path to each leaf: {"-"=>"00", "b"=>"010", "y"=>"100", "c"=>"1011", "D"=>"011", "o"=>"11", "S"=>"1010"} We can translated each byte of the original byte stream: ["S", "c", "o", "o", "b", "y", "-", "D", "o", "o", "b", "y", "-", "D", "o", "o"] ... to the corresponding bit stream: ["1010", "1011", "11", "11", "010", "100", "00", "011", "11", "11", "010", "100", "00", "011", "11", "11"] ... which is joined into a string: "101010111111010100000111111010100000111111" The rest, serializing the translation table and adding and registering padding bits is not that complicated and won't be explained here. The decoding (decompressing?) is, well, just the reversed order of the steps... I want to emphasize one line of the decode code. After having deserialized the translation table, we can decode the whole message with one line: bitstring.scan(/#{table.keys.join("|")}/).collect{|bits| table[bits]}.join("") gegroet, Erik V. - http://www.erikveen.dds.nl/