On 6/28/05, why the lucky stiff <ruby-talk / whytheluckystiff.net> wrote:
> Bucco wrote:
> 
> >Sorry for the newbie question.  I am trying to find the best metod for
> >parsing a HTML file and changinf one tag/item.  Unfortunately, REXML
> >chokes on the file because of the incomplete tags.  Completing the tag
> >is not an option either.  What is the best way to find a specific tag
> >in an html file, change it's text and attribute settings?
> >
> Hi.  I know I'm a bit late to the discussion, so 'sokay if you have an
> answer already.
> 
> A really fantastic HTML parser library is HTree by Tanaka Akira.
> 
>   <http://cvs.m17n.org/~akr/htree/>
> 
> It's completely forgiving of bad HTML and you can import the document
> into REXML through the HTree parser.
> 
>   require 'htree'
>   HTree.parse( "<b>Bad markup" ).to_rexml
> 
> The only downside is that you'll need to install the iconv library,
> which can be a bit of a pain to track down on Windows.  Other than that,
> it's a great package.

There's a page on the Rails site that covers the iconv installation on Windows:
http://wiki.rubyonrails.com/rails/show/iconv

Once I had the iconv.so in a library path, and iconv.dll in
windows\system32, I ran the test-all.rb.  Got an error due to a lack
of /dev/null, but that was fixed by creating a dev directory, and
adding an empty 'null' file to it.
 
Should swap that out to have it point to a temp dir, but with that
setup, all of the htree tests passed.
 
> _why
> 
> 

-- 
Bill Guindon (aka aGorilla)