------ art_199447_29909261.1179164862011 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Here's mine. It's my first submission; be gentle :) I've compared several ways of chopping up the string before building the tree and encoding. As can be expected, the bigger the chunks the smaller the encoded string but also the bigger the tree. There must be a sweet spot somewhere in the middle. $ ./rq123_huffman_rafc.rb Encoded byte tokens: Size of encoded data: 285903 Tree size: 1140 ---------------------------------- Total size: 287043 Original size: 500000 Compressed by: 42% #################################### Encoded word tokens: Size of encoded data: 145703 Tree size: 136634 ---------------------------------- Total size: 282337 Original size: 500000 Compressed by: 43% #################################### Encoded 2byte tokens: Size of encoded data: 246807 Tree size: 20761 ---------------------------------- Total size: 267568 Original size: 500000 Compressed by: 46% #################################### Encoded 3byte tokens: Size of encoded data: 218899 Tree size: 121651 ---------------------------------- Total size: 340550 Original size: 500000 Compressed by: 31% #################################### Regards, Raf ------ art_199447_29909261.1179164862011 Content-Type: application/x-ruby; name="rq123_huffman_rafc.rb" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="rq123_huffman_rafc.rb" X-Attachment-Id: f_f1p8az3w IyEvdXNyL2Jpbi9ydWJ5IC13CiMKI1RoZXJlJ3Mgc29tZSBxdWljayduJ2RpcnR5IHRoaW5ncyBp biBoZXJlIHRoYXQgY291bGQgYmUgaXJvbmVkIG91dC4uLgoKbW9kdWxlIEh1ZmZtYW4KICAjVGhl IHRva2VuIHRoYXQgaW5kaWNhdGVzIHRoZSBlbmQgb2YgdGhlIG1lc3NhZ2UKICBURVJNSU5BVE9S ID0gInsjJWVOZE9mTWVTc0FnRX0jJSIgI1dlaXJkIHN0cmluZyB0byBtaW5pbWl6ZSBwcm9iYWJp bGl0eSBvZiBjb2xsaXNpb24gd2l0aCBleGlzdGluZyB0b2tlbgoKICAjT25lIG5vZGUgb2YgYSBI dWZmbWFuIFRyZWUuIEhhcyBhICIwIiBicmFuY2ggYW5kIGEgIjEiIGJyYW5jaC4KICAjRWFjaCBi cmFuY2ggZWl0aGVyIHBvaW50cyB0byBhIHRva2VuIG9yIHRvIGFub3RoZXIgTm9kZQogIGNsYXNz IE5vZGUKICAgIGF0dHJfcmVhZGVyIDpicmFuY2hlcwogICAgZGVmIGluaXRpYWxpemUoIGJyYW5j aDAsIGJyYW5jaDEpCiAgICAgIEBicmFuY2hlcyA9IFticmFuY2gwLCBicmFuY2gxXQogICAgZW5k CgogICAgZGVmIHdhbGsoIHBhdGhfc29fZmFyLCAmYmxvY2spCiAgICAgIEBicmFuY2hlcy5lYWNo X3dpdGhfaW5kZXggZG8gfGJyYW5jaCwgaXwKICAgICAgICBuZXdfcGF0aCA9IHBhdGhfc29fZmFy LmR1cCA8PCBpCiAgICAgICAgaWYgYnJhbmNoLmlzX2E/KCBOb2RlKQogICAgICAgICAgYnJhbmNo LndhbGsoIG5ld19wYXRoLCAmYmxvY2spCiAgICAgICAgZWxzZQogICAgICAgICAgeWllbGQgbmV3 X3BhdGgsIGJyYW5jaAogICAgICAgIGVuZAogICAgICBlbmQKICAgIGVuZAoKICAgIGRlZiBpbnNw ZWN0CiAgICAgICI8I3tzZWxmLmNsYXNzfTogMD0+I3tAYnJhbmNoZXNbMF19LCAxPT4je0BicmFu Y2hlc1sxXX0+IgogICAgZW5kCiAgZW5kICNjbGFzcyBOb2RlCgogICNBIEh1ZmZtYW4gdHJlZQog ICNMZWFmcyBhcmUgdG9rZW5zOyBwYXRoIHRvIHRva2VuIGlzIEh1ZmZtYW4gY29kZSBvZiB0b2tl bgogIGNsYXNzIFRyZWUKICAgIGluY2x1ZGUgRW51bWVyYWJsZQoKICAgIGRlZiBpbml0aWFsaXpl KCB0b2tlbnMpCiAgICAgIHJhaXNlIEFyZ3VtZW50RXJyb3IubmV3KCAiTm8gdG9rZW5zIGdpdmVu LiIpIHVubGVzcyB0b2tlbnMKCiAgICAgIGlmIHRva2Vucy5pbmNsdWRlPyggVEVSTUlOQVRPUikK ICAgICAgICB3YXJuICJJbnB1dCBjb250YWlucyB0aGUgZW5kLXRva2VuLiBSZXN1bHRzIHdpbGwg YmUgaW5jb3JyZWN0ISIKICAgICAgZW5kCiAgICAgIGZyZXF1ZW5jaWVzID0gKHRva2Vucy5kdXAg PDwgVEVSTUlOQVRPUikuaW5qZWN0KCBIYXNoLm5ldyggMCkpeyB8aCwgdG9rZW58IGhbdG9rZW5d ICs9IDE7IGh9CgogICAgICAjQW5kIGhlcmUgd2UgYnVpbGQgdGhlIGFjdHVhbCB0cmVlCiAgICAg IHdoaWxlIGZyZXF1ZW5jaWVzLnNpemUgPiAxICNBcyBsb25nIGFzIHdlIGhhdmVuJ3QgYnJvdWdo dCBldmVyeXRoaW5nIHRvZ2V0aGVyIGludG8gb25lIHRyZWUKICAgICAgICAjRmluZCBsb3dlc3Qg dHdvIGZyZXF1ZW5jaWVzLCByZW1vdmUgdGhlbS4uLgogICAgICAgIGxvd3MgPSBbXQogICAgICAg IDIudGltZXMgZG8gfGl8CiAgICAgICAgICBsb3cgPSBmcmVxdWVuY2llcy5pbmplY3QoKXsgfG1p biwgZnJlcXwgbWluID0gZnJlcSBpZiBmcmVxWzFdIDwgbWluWzFdOyBtaW4gfQogICAgICAgICAg ZnJlcXVlbmNpZXMuZGVsZXRlKCBsb3dbMF0pCiAgICAgICAgICBsb3dzIDw8IGxvdwogICAgICAg IGVuZAogICAgICAgICMuLi5hbmQgY29tYmluZSB0aGVtIGludG8gb25lIE5vZGUKICAgICAgICBu b2RlID0gTm9kZS5uZXcoIGxvd3NbMF1bMF0sIGxvd3NbMV1bMF0pCiAgICAgICAgI1B1c2ggbm9k ZSBpbnRvIHRoZSBoYXNoLCB3aXRoIHRoZSBjb21iaW5lZCBmcmVxdWVuY3kgYmVpbmcgdGhlCiAg ICAgICAgI3N1bSBvZiB0aGUgdHdvIGZyZXF1ZW5jaWVzCiAgICAgICAgZnJlcXVlbmNpZXNbbm9k ZV0gPSBsb3dzWzBdWzFdICsgbG93c1sxXVsxXQogICAgICBlbmQKICAgICAgI05vdyB0aGUgaGFz aCBjb250YWlucyB0aGUgcm9vdCBub2RlIG9mIHRoZSB0cmVlCiAgICAgIEByb290X25vZGUgPSBm cmVxdWVuY2llcy5rZXlzWzBdCiAgICBlbmQgI21ldGhvZCBpbml0aWFsaXplCgogICAgZGVmIGVh Y2goICZibG9jaykKICAgICAgQHJvb3Rfbm9kZS53YWxrKCBbXSwgJmJsb2NrKQogICAgZW5kCgog ICAgZGVmIGluc3BlY3QKICAgICAgIjwje3NlbGYuY2xhc3N9OiAje2luamVjdCgge30pe3xoLCBj b2RlX3Rva2VufCBoW2NvZGVfdG9rZW5bMV1dID0gY29kZV90b2tlblswXTsgaH0uaW5zcGVjdH0+ IgogICAgZW5kCiAgICAKICAgICNFbmNvZGU6IGVuY29kZXMgYW4gYXJyYXkgb2YgdG9rZW5zIGFu ZCB3cml0ZXMKICAgICN0aGUgcmVzdWx0IHRvIG91dHB1dHN0cmVhbQogICAgZGVmIGVuY29kZSgg dG9rZW5zLCBvdXRwdXRzdHJlYW0pCiAgICAgICNIZWxwZXIgbWV0aG9kIG9uIG91dHB1dHN0cmVh bSB0byB3cml0ZSBzaW5nbGUgYml0cwogICAgICBjbGFzcyA8PCBvdXRwdXRzdHJlYW0KICAgICAg ICBkZWYgaW5pdAogICAgICAgICAgQGJ5dGUgPSAwCiAgICAgICAgICBAYml0X2NvdW50ID0gMAog ICAgICAgIGVuZAogICAgICAgIGRlZiB3cml0ZV9iaXQoIGJpdCkKICAgICAgICAgIEBieXRlICs9 IGJpdAogICAgICAgICAgQGJpdF9jb3VudCArPSAxCiAgICAgICAgICBpZiA4ID09IEBiaXRfY291 bnQKICAgICAgICAgICAgd3JpdGUoIEBieXRlLmNocikKICAgICAgICAgICAgaW5pdAogICAgICAg ICAgZWxzZQogICAgICAgICAgICBAYnl0ZSA8PD0gMQogICAgICAgICAgZW5kCiAgICAgICAgZW5k CiAgICAgICAgZGVmIGZpbGxfdXAKICAgICAgICAgIGlmIDAgPCBAYml0X2NvdW50CiAgICAgICAg ICAgICg4IC0gQGJpdF9jb3VudCkudGltZXMgeyB3cml0ZV9iaXQoIDApIH0KICAgICAgICAgIGVu ZAogICAgICAgIGVuZAogICAgICBlbmQKICAKICAgICAgdG9rZW5fdG9fY29kZSA9IHt9CiAgICAg IGVhY2h7IHxjb2RlLCB0b2tlbnwgdG9rZW5fdG9fY29kZVt0b2tlbl0gPSBjb2RlfQogIAogICAg ICBpZiB0b2tlbnMuaW5jbHVkZT8oIFRFUk1JTkFUT1IpCiAgICAgICAgd2FybiAiSW5wdXQgY29u dGFpbnMgdGhlIGVuZC10b2tlbi4gUmVzdWx0cyB3aWxsIGJlIGluY29ycmVjdCEiCiAgICAgIGVu ZAogIAogICAgICBvdXRwdXRzdHJlYW0uaW5pdAogICAgICAodG9rZW5zLmR1cCA8PCBURVJNSU5B VE9SKS5lYWNoIGRvIHx0b2tlbnwKICAgICAgICAjTm90IGdvaW5nIGZvciB0aGUgZXh0cmEgY3Jl ZGl0OiBJIGRvbid0IGVuY29kZSBieSB3YWxraW5nIHRoZSB0cmVlCiAgICAgICAgY29kZSA9IHRv a2VuX3RvX2NvZGVbdG9rZW5dCiAgICAgICAgcmFpc2UgQXJndW1lbnRFcnJvci5uZXcoICJUb2tl biAje3Rva2VuLmluc3BlY3R9IG5vdCBmb3VuZCBpbiB0cmVlIikgdW5sZXNzIGNvZGUKICAgICAg ICBjb2RlLmVhY2h7IHxiaXR8IG91dHB1dHN0cmVhbS53cml0ZV9iaXQoIGJpdCkgfQogICAgICBl bmQKICAgICAgb3V0cHV0c3RyZWFtLmZpbGxfdXAKICAgIGVuZCAjbWV0aG9kIGVuY29kZQoKICAg ICNEZWNvZGU6IGRlY29kZXMgYSBzdHJlYW0gb2YgYml0cyB0byBhbiBhcnJheSBvZiB0b2tlbnMK ICAgIGRlZiBkZWNvZGUoIGlucHV0c3RyZWFtKQogICAgICAjSGVscGVyIG1ldGhvZCBvbiBpbnB1 dHN0cmVhbSB0byByZWFkIHNpbmdsZSBiaXRzCiAgICAgIGNsYXNzIDw8IGlucHV0c3RyZWFtCiAg ICAgICAgZGVmIGluaXQKICAgICAgICAgIEBieXRlID0gMAogICAgICAgICAgQGJpdF9jb3VudCA9 IDAKICAgICAgICBlbmQKICAgICAgICBkZWYgcmVhZF9iaXQKICAgICAgICAgIGlmIDAgPT0gQGJp dF9jb3VudAogICAgICAgICAgICBAYnl0ZSA9IHJlYWQoIDEpWzBdCiAgICAgICAgICAgIEBiaXRf Y291bnQgPSA4CiAgICAgICAgICBlbmQKICAgICAgICAgIGJpdCA9IEBieXRlICYgMGIxMDAwMDAw MCA9PSAwID8gMCA6IDEKICAgICAgICAgIEBiaXRfY291bnQgLT0gMQogICAgICAgICAgQGJ5dGUg PDw9IDEKICAgICAgICAgIHJldHVybiBiaXQKICAgICAgICBlbmQKICAgICAgZW5kCiAgCiAgICAg IGlucHV0c3RyZWFtLmluaXQKICAgICAgbm9kZSA9IEByb290X25vZGUKICAgICAgdG9rZW5zID0g W10KICAgICAgbG9vcCBkbwogICAgICAgIGJpdCA9IGlucHV0c3RyZWFtLnJlYWRfYml0CiAgICAg ICAgYnJhbmNoID0gbm9kZS5icmFuY2hlc1tiaXRdCiAgICAgICAgaWYgYnJhbmNoLmlzX2E/KCBO b2RlKQogICAgICAgICAgbm9kZSA9IGJyYW5jaAogICAgICAgIGVsc2UKICAgICAgICAgIHRva2Vu ID0gYnJhbmNoCiAgICAgICAgICBicmVhayBpZiBURVJNSU5BVE9SID09IHRva2VuCiAgICAgICAg ICB0b2tlbnMgPDwgdG9rZW4KICAgICAgICAgIG5vZGUgPSBAcm9vdF9ub2RlCiAgICAgICAgZW5k CiAgICAgIGVuZAogICAgICB0b2tlbnMKICAgIGVuZCAjbWV0aG9kIGRlY29kZQogIGVuZCAjY2xh c3MgVHJlZQoKICAjQWJzdHJhY3QgVG9rZW5pemVyOiBzcGxpdHMgaW5wdXQgaW50byB0b2tlbnMK ICBjbGFzcyBUb2tlbml6ZXIKICAgIGRlZiBzZWxmLnRva2VuaXplKCAqYXJncykKICAgICAgcmFp c2UgTm90SW1wbGVtZW50ZWRFcnJvci5uZXcoICJOZWVkIGEgKmNvbmNyZXRlKiBUb2tlbml6ZXIi KQogICAgZW5kCiAgICBkZWYgc2VsZi51bnRva2VuaXplKCB0b2tlbnMpCiAgICAgIHRva2Vucy5q b2luKCAnJykKICAgIGVuZAogIGVuZCAjY2xhc3MgVG9rZW5pemVyCgogICNIZXJlJ3MgYW4gZXhh bXBsZSBvZiBhIGNvbmNyZXRlIFRva2VuaXplcgogIGNsYXNzIFN0cmluZ1RvQnl0ZVRva2VuaXpl ciA8IFRva2VuaXplcgogICAgZGVmIHNlbGYudG9rZW5pemUoICphcmdzKQogICAgICB0b19iZV90 b2tlbml6ZWQgPSBhcmdzWzBdCiAgICAgIHRvX2JlX3Rva2VuaXplZC50b19zLnNwbGl0KCAvLykK ICAgIGVuZAogIGVuZCAjY2xhc3MgU3RyaW5nVG9CeXRlVG9rZW5pemVyCgogICNBbmQgc29tZSBt b3JlLi4uCiAgY2xhc3MgU3RyaW5nVG9Xb3JkVG9rZW5pemVyIDwgVG9rZW5pemVyCiAgICBkZWYg c2VsZi50b2tlbml6ZSggKmFyZ3MpCiAgICAgIHRvX2JlX3Rva2VuaXplZCA9IGFyZ3NbMF0KICAg ICAgdG9rZW5zID0gW10KICAgICAgdCA9IHRvX2JlX3Rva2VuaXplZC5kdXAKICAgICAgdC5nc3Vi ISggLyhefFxiKShbXGRcRF0rPylcYi9tKXsgfG18IHRva2VucyA8PCBtOyAnJ30KICAgICAgdG9r ZW5zIDw8IHQKICAgIGVuZAogIGVuZCAjY2xhc3MgU3RyaW5nVG9Xb3JkVG9rZW5pemVyCgogIGNs YXNzIFN0cmluZ1RvTXVsdGlCeXRlVG9rZW5pemVyIDwgVG9rZW5pemVyCiAgICByZXF1aXJlICdl bnVtZXJhdG9yJwogICAgZGVmIHNlbGYudG9rZW5pemUoICphcmdzKQogICAgICB0b19iZV90b2tl bml6ZWQgPSBhcmdzWzBdCiAgICAgIG11bHRpcGxlID0gYXJnc1sxXQogICAgICB0b2tlbnMgPSBb XQogICAgICB0b19iZV90b2tlbml6ZWQuc3BsaXQoIC8vKS5lYWNoX3NsaWNlKCBtdWx0aXBsZSl7 IHxzfCB0b2tlbnMgPDwgc30KICAgICAgdG9rZW5zCiAgICBlbmQKICBlbmQgI2NsYXNzIFN0cmlu Z1RvTXVsdGlCeXRlVG9rZW5pemVyCgplbmQgI21vZHVsZSBIdWZmbWFuCgojIyMjIyMjIyMjIyMj IyMjIyMjIyMjIyMKI01haW46CiMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIwoKZGF0YV9maWxlID0g J2JpZy50eHQnICNodHRwOi8vbm9ydmlnLmNvbS9iaWcudHh0CgpNQVhfU0laRSA9IDUwMF8wMDAg I0xpbWl0IHNpemUgdG8ga2VlcCBydW50aW1lIHJlYXNvbmFibGUuLi4KZGF0YSA9IG9wZW4oIGRh dGFfZmlsZSkucmVhZCggTUFYX1NJWkUpCmRhdGFfc2l6ZSA9IGRhdGEuc2l6ZQoKI1RyeSBzZXZl cmFsIHRva2VuaXplcnMKWwogIFsnYnl0ZScsICBIdWZmbWFuOjpTdHJpbmdUb0J5dGVUb2tlbml6 ZXIsICAgICAgbmlsXSwKICBbJ3dvcmQnLCAgSHVmZm1hbjo6U3RyaW5nVG9Xb3JkVG9rZW5pemVy LCAgICAgIG5pbF0sCiAgWycyYnl0ZScsIEh1ZmZtYW46OlN0cmluZ1RvTXVsdGlCeXRlVG9rZW5p emVyLCAyICBdLAogIFsnM2J5dGUnLCBIdWZmbWFuOjpTdHJpbmdUb011bHRpQnl0ZVRva2VuaXpl ciwgMyAgXQpdLmVhY2ggZG8gfGxhYmVsLCB0b2tlbml6ZXIsIGV4dHJhX2FyZ3wKICB0b2tlbnMg PSB0b2tlbml6ZXIudG9rZW5pemUoIGRhdGEsIGV4dHJhX2FyZykKICB0cmVlID0gSHVmZm1hbjo6 VHJlZS5uZXcoIHRva2VucykKICAjcCB0cmVlLmluc3BlY3QKICAjcCB0cmVlLgoKICAjUGVyc2lz dCB0aGUgdHJlZQogIHRyZWVfZmlsZSA9IGRhdGFfZmlsZSArICcuJyArIGxhYmVsICsgJy50cmVl JwogIG9wZW4oIHRyZWVfZmlsZSwgJ3cnKSB7IHxmfCBmLndyaXRlKCBNYXJzaGFsLmR1bXAoIHRy ZWUpKSB9CgogICNFbmNvZGUKICBlbmNfZmlsZSA9IGRhdGFfZmlsZSArICcuJyArIGxhYmVsICsg Jy5lbmNvZGVkJwogIG9wZW4oIGVuY19maWxlLCAndycpIGRvIHxmfAogICAgdHJlZS5lbmNvZGUo IHRva2VucywgZikKICBlbmQKCiAgI0RlY29kZSBhbmQgdmVyaWZ5IGNvcnJlY3RuZXNzCiAgZGF0 YV9lbmNfZGVjID0gbmlsCiAgb3BlbiggdHJlZV9maWxlLCAncicpIHsgfGZ8IHRyZWUgPSBNYXJz aGFsLmxvYWQoIGYucmVhZCkgfQogIG9wZW4oIGVuY19maWxlLCAncicpIGRvIHxmfAogICAgZGVj b2RlZF90b2tlbnMgPSB0cmVlLmRlY29kZSggZikKICAgIGRhdGFfZW5jX2RlYyA9IHRva2VuaXpl ci51bnRva2VuaXplKCBkZWNvZGVkX3Rva2VucykKICBlbmQKICByYWlzZSAiI3tsYWJlbH06IGRh dGEgd2FzIGNoYW5nZWQgYnkgZW5jb2RpbmcgLSBkZWNvZGluZyBjeWNsZSEiIGlmIGRhdGEgIT0g ZGF0YV9lbmNfZGVjCgogICNTdGF0aXN0aWNzCiAgdHJlZV9zaXplID0gRmlsZS5zaXplKCB0cmVl X2ZpbGUpCiAgZW5jX3NpemUgPSBGaWxlLnNpemUoIGVuY19maWxlKQogIHRvdGFsX3NpemUgPSB0 cmVlX3NpemUgKyBlbmNfc2l6ZQoKICBwdXRzICJFbmNvZGVkICN7bGFiZWx9IHRva2VuczoiCiAg cHV0cyAiICBTaXplIG9mIGVuY29kZWQgZGF0YTogI3tlbmNfc2l6ZS50b19zLnJqdXN0KDEyKX0i CiAgcHV0cyAiICBUcmVlIHNpemU6ICAgICAgICAgICAgI3t0cmVlX3NpemUudG9fcy5yanVzdCgx Mil9IgogIHB1dHMgIiAgLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLSIKICBwdXRz ICIgIFRvdGFsIHNpemU6ICAgICAgICAgICAje3RvdGFsX3NpemUudG9fcy5yanVzdCgxMil9Igog IHB1dHMgIiAgT3JpZ2luYWwgc2l6ZTogICAgICAgICN7ZGF0YV9zaXplLnRvX3Mucmp1c3QoMTIp fSIKICBjb21wcmVzc2lvbiA9IEludGVnZXIoIChkYXRhX3NpemUgLSB0b3RhbF9zaXplKSAqIDEw MC4wIC8gZGF0YV9zaXplKQogIHB1dHMgIiAgQ29tcHJlc3NlZCBieTogICAgICAgICN7Y29tcHJl c3Npb24udG9fcy5yanVzdCgxMSl9JSIKICBwdXRzICIjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMj IyMjIyMjIyMjIyMiCgplbmQK ------ art_199447_29909261.1179164862011--