Well, a couple of days ago I was taking some
time off. Work had gotten so rough I didn't have
any time for Ruby...

Yes, a real man would have watched TV or something.
But I wrote some code.

Remember the Huffman algorithm which assigns
variable-length instantaneously-decodable bit codes
to characters based on the frequency of their occurrence?

I implemented a crude version of it. This is essentially the
same problem I was given in coding theory in 1984; back
then, I wrote and debugged the Pascal version in a few
days, and it was 400 lines or so.

I wrote the Ruby version (encoding only) in two hours. This
included writing a little BitString class to handle the packing
of arbitrary-length bitfields into (essentially) a BigNum. I had
a primitive functioning version in 104 lines of code.

Believe me, this development speed is a tribute to Ruby,
not to my skills.

It's not especially efficient, of course. That's putting it mildly.

I ran it on a moderately fast PC with a 1.1 meg input file...
took two minutes to run.

One big source of inefficiency is my using a sort inside a loop,
sorting an array with each iteration (even though the "right" way
is to insert the new item at the appropriate location and not call
sort).

Now, why was I thinking about data compression in the first place?

Actually, not for purposes of compressing, but for obscuring. I was
trying to think of some way to encode a Ruby program and decrypt
it and run it on the fly. (For purposes of hiding one's source code.)

Of course, real encryption would be better for that purpose.

I also considered the idea of compressing a bunch of Ruby files into
BitString objects and then marshaling them. Then I could pre-empt the
"require" method to use my in-memory code rather than search for a
file. It's sort of a jar file idea. But I encountered problems with that
line
of thought and have abandoned it in favor of watching TV.

Enough of my rambling. Back to your regularly scheduled program...

Hal