On Mar 22, 6:00 am, Clifford Heath <n... / spam.please.net> wrote: > Trans wrote: > > I think one needs to be careful in understanding this though. The > > sorting only provides a benefit in contrast to less efficient > > algorithms. No amount of reordering will ever make a difference to the > > lower bound of compressibility. That's the crux of my point. And > > hence, for a better algorithm, this kind of sorting should be a moot > > point. (Of course, one could argue that this sorting Is the better > > algorithm -- fair enough.) > > Your analysis is well-meaning, but flawed. Compression relies on > prediction. If you know I'm going to send you a message telling > you how many bananas to buy, and you already know that's the subject > of the message, then I never need to say "bananas", I can just say > "12". That only works for limited data sets. The larger the set, the less helpful it is. And for general compression it doesn't help at all. General compression is all about removing repetition. Try assign a number to every word in the dictionary and "compress" a piece of text with it. It won't help. > If you know you're receiving a message that contains large amounts > of English text, or C language text, or i386 binary opcodes, or you > can recognise and adapt when you find that you are, you'll be able > to compress better than any algorithm that only has a single trick, > repetition, as deflate does. Otherwise we'd all be using deflate for > download pirated video streams and no-one would need H.264 :-) Those things are not limited data sets. They have some limiting rules (if the files are valid) but those are pretty much useless to boost compression. > BWT improves deflate's predictor for the kinds of text we tend to > use. Period. Nothing really interesting there, except how it does it. "How it does it" is the interesting part. How does BWT improve deflate? Ie. What's that tell us about deflate? What does it tell us about compression in general? T.