On Tue, Jul 07, 2009 at 10:18:50PM +0900, Patrick Lajeunesse wrote: > Hi, > I'm trying to scrape links using Mechanize. Sometimes accented characters > (on French pages) are corrupt once Ruby gets them. To see what I mean, check > this: > > require 'mechanize' > a = WWW::Mechanize.new > page = a.get('http://www.agr.gc.ca/cb/index_f.php?s1=n&s2=index&page=2009_07 > ') > page.links.each do |a_link| > puts a_link > end > > Of course, it's only the accents that are entered in plain text (i.e., > without entities) that have this problem. But in an imperfect world, I can't > always count on accents being entered properly. > > Is there anything I can do about this? I've tried using Iconv to convert the > strings to UTF-8, but that just resulted in a different (but still wrong) > character in place of the broken ones. What version of nokogiri / mechanize do you have installed? I ran your code and was able to see the accents: http://skitch.com/aaron.patterson/bs4qt/terminal-bash-80x24 Most of the time, these encoding issues are due to the server incorrectly identifying the encoding of the content. Is this content supposed to be ISO-8859-1? -- Aaron Patterson http://tenderlovemaking.com/