Just Another Victim of the Ambient Morality wrote:

>     Incidentally, what inspired all this is that I'm trying to use 
> mechanize and it's totally not working.  Specifically, some sites like 
> slashdot.org work fine but rubyforge.org fails spectacularly.  What happens 
> is that all links at rubyforge.org are nil.  Some debugging suggests that 
> the problem is in htmltools, which mechanize uses to parse the HTML.  So, 
> someone else's code uses someone else's code to do the work.  Now you can 
> see why the term "node" comes up a lot in my example problems.
>     Except for the copyright and license agreement, there's almost no 
> documentation...
>     I'm tempted to make another post asking other people who use mechanize 
> (and/or htmltools) to go to rubyforge.org and see if they have the same 
> problems I do...


I've used Mechanize for a few things.  When I've had undesired behavior, 
I've added debugging print statements where htmltools or mechanize is 
grabbing raw text and emitting  "fixed" text or a REXML DOM.

If the source HTML is screwy, the parser may end up making really bad 
guesses as to what the correct markup should be.  (One tell-tale sign is 
a series of identical closing tags.)

So you may want to visually inspect some sample source HTML to see if it 
  is proper HTML, and if it is not, see what might be choking the parser.

I just grabbed a random project page, and  found an unclosed link 
element, and 2 ul end tags that had no begin tags.  Odd.  Certainly not 
the XHTML the doctype claims it to be, and perhaps enough to screw 
Mechanize.

(Offhand, I don't see how static or explicit typing would help track 
these sorts of issues.  Unit tests might.)


-- 
James Britt

"Programs must be written for people to read, and only incidentally
  for machines to execute."
   - H. Abelson and G. Sussman
   (in "The Structure and Interpretation of Computer Programs)