Arup:=20

I did install the PDF to HTML gem and have to say it=92s pretty =
impressive! It=92s all based on the pdf2htmlEX project:=20

https://github.com/coolwanglu/pdf2htmlEX/tree/master/src

(it=92s basically just a nice ruby wrapper, so you have to have =
pdf2htmlEX installed). But this gem actually opens up a whole new world =
of possibilities.=20

In combination with something like nokogiri, you should be able to parse =
almost all the data you want. However, this means you=92ll need to brush =
up on your css and/or xpath to parse again with nokogiri.=20

On Mac OS X, it was pretty easy to install the pdf2htmEX toolset. For =
Windows, somebody has already done the compiling for you here: =
http://soft.rubypdf.com/software/pdf2htmlex-windows-version

Good luck!=20

FYI, there is a googlegroup for the pdf2htmlEX toolset and you=92re =
going to be better off asking questions there rather than this list for =
any additional help with those toolsets if you choose to use them since =
this list is  strictly for ruby related things.=20

Wayne