On Sun, 23 Apr 2006, Mateo Barraza wrote: > I'm fairly new to the Ruby scene. > Is there any library that can read MS Word (.doc) files and extract the pure > text...what about libs for PDF files? Hi, There's not a MS Word library that I know of that will easily allow you to extract the pure text, but the OLE suggestion is a good idea. Another method would be to save as WordprocessingML (XML) (if you have Word 2003) and use either REXML or libxml-ruby (two Ruby XML libraries) to parse it (or XSLT). If you've got XML, the interesting nodes (if you really only want text) are the 'w:t' ones. HTH, Keith