Trans wrote:
> I think one needs to be careful in understanding this though. The
> sorting only provides a benefit in contrast to less efficient
> algorithms. No amount of reordering will ever make a difference to the
> lower bound of compressibility. That's the crux of my point. And
> hence, for a better algorithm, this kind of sorting should be a moot
> point. (Of course, one could argue that this sorting Is the better
> algorithm -- fair enough.)

Your analysis is well-meaning, but flawed. Compression relies on
prediction. If you know I'm going to send you a message telling
you how many bananas to buy, and you already know that's the subject
of the message, then I never need to say "bananas", I can just say
"12".

If you know you're receiving a message that contains large amounts
of English text, or C language text, or i386 binary opcodes, or you
can recognise and adapt when you find that you are, you'll be able
to compress better than any algorithm that only has a single trick,
repetition, as deflate does. Otherwise we'd all be using deflate for
download pirated video streams and no-one would need H.264 :-)

BWT improves deflate's predictor for the kinds of text we tend to
use. Period. Nothing really interesting there, except how it does it.

Clifford Heath.