On 6/28/05, why the lucky stiff <ruby-talk / whytheluckystiff.net> wrote: > Bucco wrote: > > >Sorry for the newbie question. I am trying to find the best metod for > >parsing a HTML file and changinf one tag/item. Unfortunately, REXML > >chokes on the file because of the incomplete tags. Completing the tag > >is not an option either. What is the best way to find a specific tag > >in an html file, change it's text and attribute settings? > > > Hi. I know I'm a bit late to the discussion, so 'sokay if you have an > answer already. > > A really fantastic HTML parser library is HTree by Tanaka Akira. > > <http://cvs.m17n.org/~akr/htree/> > > It's completely forgiving of bad HTML and you can import the document > into REXML through the HTree parser. > > require 'htree' > HTree.parse( "<b>Bad markup" ).to_rexml > > The only downside is that you'll need to install the iconv library, > which can be a bit of a pain to track down on Windows. Other than that, > it's a great package. There's a page on the Rails site that covers the iconv installation on Windows: http://wiki.rubyonrails.com/rails/show/iconv Once I had the iconv.so in a library path, and iconv.dll in windows\system32, I ran the test-all.rb. Got an error due to a lack of /dev/null, but that was fixed by creating a dev directory, and adding an empty 'null' file to it. Should swap that out to have it point to a temp dir, but with that setup, all of the htree tests passed. > _why > > -- Bill Guindon (aka aGorilla)