Just Another Victim of the Ambient Morality wrote: > Incidentally, what inspired all this is that I'm trying to use > mechanize and it's totally not working. Specifically, some sites like > slashdot.org work fine but rubyforge.org fails spectacularly. What happens > is that all links at rubyforge.org are nil. Some debugging suggests that > the problem is in htmltools, which mechanize uses to parse the HTML. So, > someone else's code uses someone else's code to do the work. Now you can > see why the term "node" comes up a lot in my example problems. > Except for the copyright and license agreement, there's almost no > documentation... > I'm tempted to make another post asking other people who use mechanize > (and/or htmltools) to go to rubyforge.org and see if they have the same > problems I do... I've used Mechanize for a few things. When I've had undesired behavior, I've added debugging print statements where htmltools or mechanize is grabbing raw text and emitting "fixed" text or a REXML DOM. If the source HTML is screwy, the parser may end up making really bad guesses as to what the correct markup should be. (One tell-tale sign is a series of identical closing tags.) So you may want to visually inspect some sample source HTML to see if it is proper HTML, and if it is not, see what might be choking the parser. I just grabbed a random project page, and found an unclosed link element, and 2 ul end tags that had no begin tags. Odd. Certainly not the XHTML the doctype claims it to be, and perhaps enough to screw Mechanize. (Offhand, I don't see how static or explicit typing would help track these sorts of issues. Unit tests might.) -- James Britt "Programs must be written for people to read, and only incidentally for machines to execute." - H. Abelson and G. Sussman (in "The Structure and Interpretation of Computer Programs)