On Mon, May 9, 2011 at 8:01 PM, James <oscartheduck / gmail.com> wrote: > Regular Expressions are pretty much the standard way of parsing text files, > aren't they? Certainly they're what I've been using for years now. PDFs aren't "just" text files. A randomly-chosen excerpt from a random PDF I have lying about: 11 0 obj << /Title(1. The Quest for Quantum Gravity) /Dest/section.1 /Parent 10 0 R /Next 12 0 R >> endobj Source: <http://arxiv.org/abs/1010.3420v1> I could have excerpted parts of the binary blob this PDF includes at the start, but I rather not break anyone's email client without intending to. ;) -- Phillip Gawlowski Though the folk I have met, (Ah, how soon!) they forget When I've moved on to some other place, There may be one or two, When I've played and passed through, Who'll remember my song or my face.