On Mon, May 9, 2011 at 8:01 PM, James <oscartheduck / gmail.com> wrote:
> Regular Expressions are pretty much the standard way of parsing text files,
> aren't they? Certainly they're what I've been using for years now.

PDFs aren't "just" text files.

A randomly-chosen excerpt from a random PDF I have lying about:

11 0 obj
<< /Title(1. The Quest for Quantum Gravity)
/Dest/section.1
/Parent 10 0 R
/Next 12 0 R
>>
endobj

Source: <http://arxiv.org/abs/1010.3420v1>

I could have excerpted parts of the binary blob this PDF includes at
the start, but I rather not break anyone's email client without
intending to. ;)

-- 
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.