------ art_10592_8713769.1145755410821 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Thanks for your responses; I also found that the POI java project was extended to support ruby: http://jakarta.apache.org/poi/poi-ruby.html Although, I think the win32ole solution is the best for simply reading the content of the docs... M On 4/22/06, Keith Fahlgren <keith / oreilly.com> wrote: > > On Sun, 23 Apr 2006, Mateo Barraza wrote: > > I'm fairly new to the Ruby scene. > > Is there any library that can read MS Word (.doc) files and extract the > pure > > text...what about libs for PDF files? > > Hi, > > There's not a MS Word library that I know of that will easily allow you > to extract the pure text, but the OLE suggestion is a good idea. Another > method would be to save as WordprocessingML (XML) (if you have Word 2003) > and use > either REXML or libxml-ruby (two Ruby XML libraries) to parse it (or > XSLT). If you've got XML, the > interesting nodes (if you really only want text) are the 'w:t' ones. > > > HTH, > Keith > > > ------ art_10592_8713769.1145755410821--