On 6/6/07, coelho coelho <conta001 / mailinator.com> wrote: > I am trying to use the rpdf2text library, does anyone have any example > on how to use properly? http://raa.ruby-lang.org/project/rpdf2txt/ this is taken from bin/rpdf2txt (comments added for your benefit;) <snip> # create a parser-instance, with the pdf-content as its first argument. # The second argument is the encoding you want the resulting String to # have. Note: if you need utf8, I recommend the character-encoding # library by Nikolai Weibull parser = Rpdf2txt::Parser.new(File.read(ARGV[0]), 'utf8') outstream = STDOUT if(ARGV.size == 2) outstream = File.open(ARGV[1], 'w') end # create a callback handler (If you roll your own, be sure to include # Rpdf2txt::DefaultHandler). outstream needs to respond to :<< handler = Rpdf2txt::ColumnHandler.new(outstream, padding) parser.extract_text(handler) </snip> There have recently been a couple of major improvements in how rpdf2txt positions characters. However, there's no official release for that yet. Since you're just starting out, I would recommend using a daily build from http://download.ywesee.com/rpdf2txt/rpdf2txt-daily.tar.bz2, or download rpdf2txt via git/cogito: cg-clone http://scm.ywesee.com/rpdf2txt Changelog: http://scm.ywesee.com/?p=rpdf2txt;a=summary hth, let me know if it works for you Hannes -- pub 1024D/60312B5F 2003-10-09 Hannes Wyss <hwyss / ywesee.com> Key fingerprint = 82D1 90C7 3F3D 93DC F715 4F8B 987A 628E 6031 2B5F www.ywesee.com > intellectual capital connected > www.oddb.org