------art_199447_29909261.1179164862011
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Here's mine. It's my first submission; be gentle :)

I've compared several ways of chopping up the string before building
the tree and encoding. As can be expected, the bigger the chunks the
smaller the encoded string but also the bigger the tree. There must be
a sweet spot somewhere in the middle.

$ ./rq123_huffman_rafc.rb
Encoded byte tokens:
  Size of encoded data:       285903
  Tree size:                    1140
  ----------------------------------
  Total size:                 287043
  Original size:              500000
  Compressed by:                 42%
####################################
Encoded word tokens:
  Size of encoded data:       145703
  Tree size:                  136634
  ----------------------------------
  Total size:                 282337
  Original size:              500000
  Compressed by:                 43%
####################################
Encoded 2byte tokens:
  Size of encoded data:       246807
  Tree size:                   20761
  ----------------------------------
  Total size:                 267568
  Original size:              500000
  Compressed by:                 46%
####################################
Encoded 3byte tokens:
  Size of encoded data:       218899
  Tree size:                  121651
  ----------------------------------
  Total size:                 340550
  Original size:              500000
  Compressed by:                 31%
####################################


Regards,
Raf

------art_199447_29909261.1179164862011
Content-Type: application/x-ruby; name="rq123_huffman_rafc.rb"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="rq123_huffman_rafc.rb"
X-Attachment-Id: f_f1p8az3w

IyEvdXNyL2Jpbi9ydWJ5IC13CiMKI1RoZXJlJ3Mgc29tZSBxdWljayduJ2RpcnR5IHRoaW5ncyBp
biBoZXJlIHRoYXQgY291bGQgYmUgaXJvbmVkIG91dC4uLgoKbW9kdWxlIEh1ZmZtYW4KICAjVGhl
IHRva2VuIHRoYXQgaW5kaWNhdGVzIHRoZSBlbmQgb2YgdGhlIG1lc3NhZ2UKICBURVJNSU5BVE9S
ID0gInsjJWVOZE9mTWVTc0FnRX0jJSIgI1dlaXJkIHN0cmluZyB0byBtaW5pbWl6ZSBwcm9iYWJp
bGl0eSBvZiBjb2xsaXNpb24gd2l0aCBleGlzdGluZyB0b2tlbgoKICAjT25lIG5vZGUgb2YgYSBI
dWZmbWFuIFRyZWUuIEhhcyBhICIwIiBicmFuY2ggYW5kIGEgIjEiIGJyYW5jaC4KICAjRWFjaCBi
cmFuY2ggZWl0aGVyIHBvaW50cyB0byBhIHRva2VuIG9yIHRvIGFub3RoZXIgTm9kZQogIGNsYXNz
IE5vZGUKICAgIGF0dHJfcmVhZGVyIDpicmFuY2hlcwogICAgZGVmIGluaXRpYWxpemUoIGJyYW5j
aDAsIGJyYW5jaDEpCiAgICAgIEBicmFuY2hlcyA9IFticmFuY2gwLCBicmFuY2gxXQogICAgZW5k
CgogICAgZGVmIHdhbGsoIHBhdGhfc29fZmFyLCAmYmxvY2spCiAgICAgIEBicmFuY2hlcy5lYWNo
X3dpdGhfaW5kZXggZG8gfGJyYW5jaCwgaXwKICAgICAgICBuZXdfcGF0aCA9IHBhdGhfc29fZmFy
LmR1cCA8PCBpCiAgICAgICAgaWYgYnJhbmNoLmlzX2E/KCBOb2RlKQogICAgICAgICAgYnJhbmNo
LndhbGsoIG5ld19wYXRoLCAmYmxvY2spCiAgICAgICAgZWxzZQogICAgICAgICAgeWllbGQgbmV3
X3BhdGgsIGJyYW5jaAogICAgICAgIGVuZAogICAgICBlbmQKICAgIGVuZAoKICAgIGRlZiBpbnNw
ZWN0CiAgICAgICI8I3tzZWxmLmNsYXNzfTogMD0+I3tAYnJhbmNoZXNbMF19LCAxPT4je0BicmFu
Y2hlc1sxXX0+IgogICAgZW5kCiAgZW5kICNjbGFzcyBOb2RlCgogICNBIEh1ZmZtYW4gdHJlZQog
ICNMZWFmcyBhcmUgdG9rZW5zOyBwYXRoIHRvIHRva2VuIGlzIEh1ZmZtYW4gY29kZSBvZiB0b2tl
bgogIGNsYXNzIFRyZWUKICAgIGluY2x1ZGUgRW51bWVyYWJsZQoKICAgIGRlZiBpbml0aWFsaXpl
KCB0b2tlbnMpCiAgICAgIHJhaXNlIEFyZ3VtZW50RXJyb3IubmV3KCAiTm8gdG9rZW5zIGdpdmVu
LiIpIHVubGVzcyB0b2tlbnMKCiAgICAgIGlmIHRva2Vucy5pbmNsdWRlPyggVEVSTUlOQVRPUikK
ICAgICAgICB3YXJuICJJbnB1dCBjb250YWlucyB0aGUgZW5kLXRva2VuLiBSZXN1bHRzIHdpbGwg
YmUgaW5jb3JyZWN0ISIKICAgICAgZW5kCiAgICAgIGZyZXF1ZW5jaWVzID0gKHRva2Vucy5kdXAg
PDwgVEVSTUlOQVRPUikuaW5qZWN0KCBIYXNoLm5ldyggMCkpeyB8aCwgdG9rZW58IGhbdG9rZW5d
ICs9IDE7IGh9CgogICAgICAjQW5kIGhlcmUgd2UgYnVpbGQgdGhlIGFjdHVhbCB0cmVlCiAgICAg
IHdoaWxlIGZyZXF1ZW5jaWVzLnNpemUgPiAxICNBcyBsb25nIGFzIHdlIGhhdmVuJ3QgYnJvdWdo
dCBldmVyeXRoaW5nIHRvZ2V0aGVyIGludG8gb25lIHRyZWUKICAgICAgICAjRmluZCBsb3dlc3Qg
dHdvIGZyZXF1ZW5jaWVzLCByZW1vdmUgdGhlbS4uLgogICAgICAgIGxvd3MgPSBbXQogICAgICAg
IDIudGltZXMgZG8gfGl8CiAgICAgICAgICBsb3cgPSBmcmVxdWVuY2llcy5pbmplY3QoKXsgfG1p
biwgZnJlcXwgbWluID0gZnJlcSBpZiBmcmVxWzFdIDwgbWluWzFdOyBtaW4gfQogICAgICAgICAg
ZnJlcXVlbmNpZXMuZGVsZXRlKCBsb3dbMF0pCiAgICAgICAgICBsb3dzIDw8IGxvdwogICAgICAg
IGVuZAogICAgICAgICMuLi5hbmQgY29tYmluZSB0aGVtIGludG8gb25lIE5vZGUKICAgICAgICBu
b2RlID0gTm9kZS5uZXcoIGxvd3NbMF1bMF0sIGxvd3NbMV1bMF0pCiAgICAgICAgI1B1c2ggbm9k
ZSBpbnRvIHRoZSBoYXNoLCB3aXRoIHRoZSBjb21iaW5lZCBmcmVxdWVuY3kgYmVpbmcgdGhlCiAg
ICAgICAgI3N1bSBvZiB0aGUgdHdvIGZyZXF1ZW5jaWVzCiAgICAgICAgZnJlcXVlbmNpZXNbbm9k
ZV0gPSBsb3dzWzBdWzFdICsgbG93c1sxXVsxXQogICAgICBlbmQKICAgICAgI05vdyB0aGUgaGFz
aCBjb250YWlucyB0aGUgcm9vdCBub2RlIG9mIHRoZSB0cmVlCiAgICAgIEByb290X25vZGUgPSBm
cmVxdWVuY2llcy5rZXlzWzBdCiAgICBlbmQgI21ldGhvZCBpbml0aWFsaXplCgogICAgZGVmIGVh
Y2goICZibG9jaykKICAgICAgQHJvb3Rfbm9kZS53YWxrKCBbXSwgJmJsb2NrKQogICAgZW5kCgog
ICAgZGVmIGluc3BlY3QKICAgICAgIjwje3NlbGYuY2xhc3N9OiAje2luamVjdCgge30pe3xoLCBj
b2RlX3Rva2VufCBoW2NvZGVfdG9rZW5bMV1dID0gY29kZV90b2tlblswXTsgaH0uaW5zcGVjdH0+
IgogICAgZW5kCiAgICAKICAgICNFbmNvZGU6IGVuY29kZXMgYW4gYXJyYXkgb2YgdG9rZW5zIGFu
ZCB3cml0ZXMKICAgICN0aGUgcmVzdWx0IHRvIG91dHB1dHN0cmVhbQogICAgZGVmIGVuY29kZSgg
dG9rZW5zLCBvdXRwdXRzdHJlYW0pCiAgICAgICNIZWxwZXIgbWV0aG9kIG9uIG91dHB1dHN0cmVh
bSB0byB3cml0ZSBzaW5nbGUgYml0cwogICAgICBjbGFzcyA8PCBvdXRwdXRzdHJlYW0KICAgICAg
ICBkZWYgaW5pdAogICAgICAgICAgQGJ5dGUgPSAwCiAgICAgICAgICBAYml0X2NvdW50ID0gMAog
ICAgICAgIGVuZAogICAgICAgIGRlZiB3cml0ZV9iaXQoIGJpdCkKICAgICAgICAgIEBieXRlICs9
IGJpdAogICAgICAgICAgQGJpdF9jb3VudCArPSAxCiAgICAgICAgICBpZiA4ID09IEBiaXRfY291
bnQKICAgICAgICAgICAgd3JpdGUoIEBieXRlLmNocikKICAgICAgICAgICAgaW5pdAogICAgICAg
ICAgZWxzZQogICAgICAgICAgICBAYnl0ZSA8PD0gMQogICAgICAgICAgZW5kCiAgICAgICAgZW5k
CiAgICAgICAgZGVmIGZpbGxfdXAKICAgICAgICAgIGlmIDAgPCBAYml0X2NvdW50CiAgICAgICAg
ICAgICg4IC0gQGJpdF9jb3VudCkudGltZXMgeyB3cml0ZV9iaXQoIDApIH0KICAgICAgICAgIGVu
ZAogICAgICAgIGVuZAogICAgICBlbmQKICAKICAgICAgdG9rZW5fdG9fY29kZSA9IHt9CiAgICAg
IGVhY2h7IHxjb2RlLCB0b2tlbnwgdG9rZW5fdG9fY29kZVt0b2tlbl0gPSBjb2RlfQogIAogICAg
ICBpZiB0b2tlbnMuaW5jbHVkZT8oIFRFUk1JTkFUT1IpCiAgICAgICAgd2FybiAiSW5wdXQgY29u
dGFpbnMgdGhlIGVuZC10b2tlbi4gUmVzdWx0cyB3aWxsIGJlIGluY29ycmVjdCEiCiAgICAgIGVu
ZAogIAogICAgICBvdXRwdXRzdHJlYW0uaW5pdAogICAgICAodG9rZW5zLmR1cCA8PCBURVJNSU5B
VE9SKS5lYWNoIGRvIHx0b2tlbnwKICAgICAgICAjTm90IGdvaW5nIGZvciB0aGUgZXh0cmEgY3Jl
ZGl0OiBJIGRvbid0IGVuY29kZSBieSB3YWxraW5nIHRoZSB0cmVlCiAgICAgICAgY29kZSA9IHRv
a2VuX3RvX2NvZGVbdG9rZW5dCiAgICAgICAgcmFpc2UgQXJndW1lbnRFcnJvci5uZXcoICJUb2tl
biAje3Rva2VuLmluc3BlY3R9IG5vdCBmb3VuZCBpbiB0cmVlIikgdW5sZXNzIGNvZGUKICAgICAg
ICBjb2RlLmVhY2h7IHxiaXR8IG91dHB1dHN0cmVhbS53cml0ZV9iaXQoIGJpdCkgfQogICAgICBl
bmQKICAgICAgb3V0cHV0c3RyZWFtLmZpbGxfdXAKICAgIGVuZCAjbWV0aG9kIGVuY29kZQoKICAg
ICNEZWNvZGU6IGRlY29kZXMgYSBzdHJlYW0gb2YgYml0cyB0byBhbiBhcnJheSBvZiB0b2tlbnMK
ICAgIGRlZiBkZWNvZGUoIGlucHV0c3RyZWFtKQogICAgICAjSGVscGVyIG1ldGhvZCBvbiBpbnB1
dHN0cmVhbSB0byByZWFkIHNpbmdsZSBiaXRzCiAgICAgIGNsYXNzIDw8IGlucHV0c3RyZWFtCiAg
ICAgICAgZGVmIGluaXQKICAgICAgICAgIEBieXRlID0gMAogICAgICAgICAgQGJpdF9jb3VudCA9
IDAKICAgICAgICBlbmQKICAgICAgICBkZWYgcmVhZF9iaXQKICAgICAgICAgIGlmIDAgPT0gQGJp
dF9jb3VudAogICAgICAgICAgICBAYnl0ZSA9IHJlYWQoIDEpWzBdCiAgICAgICAgICAgIEBiaXRf
Y291bnQgPSA4CiAgICAgICAgICBlbmQKICAgICAgICAgIGJpdCA9IEBieXRlICYgMGIxMDAwMDAw
MCA9PSAwID8gMCA6IDEKICAgICAgICAgIEBiaXRfY291bnQgLT0gMQogICAgICAgICAgQGJ5dGUg
PDw9IDEKICAgICAgICAgIHJldHVybiBiaXQKICAgICAgICBlbmQKICAgICAgZW5kCiAgCiAgICAg
IGlucHV0c3RyZWFtLmluaXQKICAgICAgbm9kZSA9IEByb290X25vZGUKICAgICAgdG9rZW5zID0g
W10KICAgICAgbG9vcCBkbwogICAgICAgIGJpdCA9IGlucHV0c3RyZWFtLnJlYWRfYml0CiAgICAg
ICAgYnJhbmNoID0gbm9kZS5icmFuY2hlc1tiaXRdCiAgICAgICAgaWYgYnJhbmNoLmlzX2E/KCBO
b2RlKQogICAgICAgICAgbm9kZSA9IGJyYW5jaAogICAgICAgIGVsc2UKICAgICAgICAgIHRva2Vu
ID0gYnJhbmNoCiAgICAgICAgICBicmVhayBpZiBURVJNSU5BVE9SID09IHRva2VuCiAgICAgICAg
ICB0b2tlbnMgPDwgdG9rZW4KICAgICAgICAgIG5vZGUgPSBAcm9vdF9ub2RlCiAgICAgICAgZW5k
CiAgICAgIGVuZAogICAgICB0b2tlbnMKICAgIGVuZCAjbWV0aG9kIGRlY29kZQogIGVuZCAjY2xh
c3MgVHJlZQoKICAjQWJzdHJhY3QgVG9rZW5pemVyOiBzcGxpdHMgaW5wdXQgaW50byB0b2tlbnMK
ICBjbGFzcyBUb2tlbml6ZXIKICAgIGRlZiBzZWxmLnRva2VuaXplKCAqYXJncykKICAgICAgcmFp
c2UgTm90SW1wbGVtZW50ZWRFcnJvci5uZXcoICJOZWVkIGEgKmNvbmNyZXRlKiBUb2tlbml6ZXIi
KQogICAgZW5kCiAgICBkZWYgc2VsZi51bnRva2VuaXplKCB0b2tlbnMpCiAgICAgIHRva2Vucy5q
b2luKCAnJykKICAgIGVuZAogIGVuZCAjY2xhc3MgVG9rZW5pemVyCgogICNIZXJlJ3MgYW4gZXhh
bXBsZSBvZiBhIGNvbmNyZXRlIFRva2VuaXplcgogIGNsYXNzIFN0cmluZ1RvQnl0ZVRva2VuaXpl
ciA8IFRva2VuaXplcgogICAgZGVmIHNlbGYudG9rZW5pemUoICphcmdzKQogICAgICB0b19iZV90
b2tlbml6ZWQgPSBhcmdzWzBdCiAgICAgIHRvX2JlX3Rva2VuaXplZC50b19zLnNwbGl0KCAvLykK
ICAgIGVuZAogIGVuZCAjY2xhc3MgU3RyaW5nVG9CeXRlVG9rZW5pemVyCgogICNBbmQgc29tZSBt
b3JlLi4uCiAgY2xhc3MgU3RyaW5nVG9Xb3JkVG9rZW5pemVyIDwgVG9rZW5pemVyCiAgICBkZWYg
c2VsZi50b2tlbml6ZSggKmFyZ3MpCiAgICAgIHRvX2JlX3Rva2VuaXplZCA9IGFyZ3NbMF0KICAg
ICAgdG9rZW5zID0gW10KICAgICAgdCA9IHRvX2JlX3Rva2VuaXplZC5kdXAKICAgICAgdC5nc3Vi
ISggLyhefFxiKShbXGRcRF0rPylcYi9tKXsgfG18IHRva2VucyA8PCBtOyAnJ30KICAgICAgdG9r
ZW5zIDw8IHQKICAgIGVuZAogIGVuZCAjY2xhc3MgU3RyaW5nVG9Xb3JkVG9rZW5pemVyCgogIGNs
YXNzIFN0cmluZ1RvTXVsdGlCeXRlVG9rZW5pemVyIDwgVG9rZW5pemVyCiAgICByZXF1aXJlICdl
bnVtZXJhdG9yJwogICAgZGVmIHNlbGYudG9rZW5pemUoICphcmdzKQogICAgICB0b19iZV90b2tl
bml6ZWQgPSBhcmdzWzBdCiAgICAgIG11bHRpcGxlID0gYXJnc1sxXQogICAgICB0b2tlbnMgPSBb
XQogICAgICB0b19iZV90b2tlbml6ZWQuc3BsaXQoIC8vKS5lYWNoX3NsaWNlKCBtdWx0aXBsZSl7
IHxzfCB0b2tlbnMgPDwgc30KICAgICAgdG9rZW5zCiAgICBlbmQKICBlbmQgI2NsYXNzIFN0cmlu
Z1RvTXVsdGlCeXRlVG9rZW5pemVyCgplbmQgI21vZHVsZSBIdWZmbWFuCgojIyMjIyMjIyMjIyMj
IyMjIyMjIyMjIyMKI01haW46CiMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIwoKZGF0YV9maWxlID0g
J2JpZy50eHQnICNodHRwOi8vbm9ydmlnLmNvbS9iaWcudHh0CgpNQVhfU0laRSA9IDUwMF8wMDAg
I0xpbWl0IHNpemUgdG8ga2VlcCBydW50aW1lIHJlYXNvbmFibGUuLi4KZGF0YSA9IG9wZW4oIGRh
dGFfZmlsZSkucmVhZCggTUFYX1NJWkUpCmRhdGFfc2l6ZSA9IGRhdGEuc2l6ZQoKI1RyeSBzZXZl
cmFsIHRva2VuaXplcnMKWwogIFsnYnl0ZScsICBIdWZmbWFuOjpTdHJpbmdUb0J5dGVUb2tlbml6
ZXIsICAgICAgbmlsXSwKICBbJ3dvcmQnLCAgSHVmZm1hbjo6U3RyaW5nVG9Xb3JkVG9rZW5pemVy
LCAgICAgIG5pbF0sCiAgWycyYnl0ZScsIEh1ZmZtYW46OlN0cmluZ1RvTXVsdGlCeXRlVG9rZW5p
emVyLCAyICBdLAogIFsnM2J5dGUnLCBIdWZmbWFuOjpTdHJpbmdUb011bHRpQnl0ZVRva2VuaXpl
ciwgMyAgXQpdLmVhY2ggZG8gfGxhYmVsLCB0b2tlbml6ZXIsIGV4dHJhX2FyZ3wKICB0b2tlbnMg
PSB0b2tlbml6ZXIudG9rZW5pemUoIGRhdGEsIGV4dHJhX2FyZykKICB0cmVlID0gSHVmZm1hbjo6
VHJlZS5uZXcoIHRva2VucykKICAjcCB0cmVlLmluc3BlY3QKICAjcCB0cmVlLgoKICAjUGVyc2lz
dCB0aGUgdHJlZQogIHRyZWVfZmlsZSA9IGRhdGFfZmlsZSArICcuJyArIGxhYmVsICsgJy50cmVl
JwogIG9wZW4oIHRyZWVfZmlsZSwgJ3cnKSB7IHxmfCBmLndyaXRlKCBNYXJzaGFsLmR1bXAoIHRy
ZWUpKSB9CgogICNFbmNvZGUKICBlbmNfZmlsZSA9IGRhdGFfZmlsZSArICcuJyArIGxhYmVsICsg
Jy5lbmNvZGVkJwogIG9wZW4oIGVuY19maWxlLCAndycpIGRvIHxmfAogICAgdHJlZS5lbmNvZGUo
IHRva2VucywgZikKICBlbmQKCiAgI0RlY29kZSBhbmQgdmVyaWZ5IGNvcnJlY3RuZXNzCiAgZGF0
YV9lbmNfZGVjID0gbmlsCiAgb3BlbiggdHJlZV9maWxlLCAncicpIHsgfGZ8IHRyZWUgPSBNYXJz
aGFsLmxvYWQoIGYucmVhZCkgfQogIG9wZW4oIGVuY19maWxlLCAncicpIGRvIHxmfAogICAgZGVj
b2RlZF90b2tlbnMgPSB0cmVlLmRlY29kZSggZikKICAgIGRhdGFfZW5jX2RlYyA9IHRva2VuaXpl
ci51bnRva2VuaXplKCBkZWNvZGVkX3Rva2VucykKICBlbmQKICByYWlzZSAiI3tsYWJlbH06IGRh
dGEgd2FzIGNoYW5nZWQgYnkgZW5jb2RpbmcgLSBkZWNvZGluZyBjeWNsZSEiIGlmIGRhdGEgIT0g
ZGF0YV9lbmNfZGVjCgogICNTdGF0aXN0aWNzCiAgdHJlZV9zaXplID0gRmlsZS5zaXplKCB0cmVl
X2ZpbGUpCiAgZW5jX3NpemUgPSBGaWxlLnNpemUoIGVuY19maWxlKQogIHRvdGFsX3NpemUgPSB0
cmVlX3NpemUgKyBlbmNfc2l6ZQoKICBwdXRzICJFbmNvZGVkICN7bGFiZWx9IHRva2VuczoiCiAg
cHV0cyAiICBTaXplIG9mIGVuY29kZWQgZGF0YTogI3tlbmNfc2l6ZS50b19zLnJqdXN0KDEyKX0i
CiAgcHV0cyAiICBUcmVlIHNpemU6ICAgICAgICAgICAgI3t0cmVlX3NpemUudG9fcy5yanVzdCgx
Mil9IgogIHB1dHMgIiAgLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLSIKICBwdXRz
ICIgIFRvdGFsIHNpemU6ICAgICAgICAgICAje3RvdGFsX3NpemUudG9fcy5yanVzdCgxMil9Igog
IHB1dHMgIiAgT3JpZ2luYWwgc2l6ZTogICAgICAgICN7ZGF0YV9zaXplLnRvX3Mucmp1c3QoMTIp
fSIKICBjb21wcmVzc2lvbiA9IEludGVnZXIoIChkYXRhX3NpemUgLSB0b3RhbF9zaXplKSAqIDEw
MC4wIC8gZGF0YV9zaXplKQogIHB1dHMgIiAgQ29tcHJlc3NlZCBieTogICAgICAgICN7Y29tcHJl
c3Npb24udG9fcy5yanVzdCgxMSl9JSIKICBwdXRzICIjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMj
IyMjIyMjIyMjIyMiCgplbmQK
------art_199447_29909261.1179164862011--