------art_10592_8713769.1145755410821
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Thanks for your responses; I also found that the POI java project was
extended to support ruby:
http://jakarta.apache.org/poi/poi-ruby.html
Although, I think the win32ole solution is the best for simply
reading the content of the docs...

M


On 4/22/06, Keith Fahlgren <keith / oreilly.com> wrote:
>
> On Sun, 23 Apr 2006, Mateo Barraza wrote:
> > I'm fairly new to the Ruby scene.
> > Is there any library that can read MS Word (.doc) files and extract the
> pure
> > text...what about libs for PDF files?
>
> Hi,
>
> There's not a MS Word library that I know of that will easily allow you
> to extract the pure text, but the OLE suggestion is a good idea. Another
> method would be to save as WordprocessingML (XML) (if you have Word 2003)
> and use
> either REXML or libxml-ruby (two Ruby XML libraries) to parse it (or
> XSLT). If you've got XML, the
> interesting nodes (if you really only want text) are the 'w:t' ones.
>
>
> HTH,
> Keith
>
>
>

------art_10592_8713769.1145755410821--