Bucco wrote:
> 1.  I tried the example for the htmltokenizer and got an error around
> assert.  Where/what is the assert method?

An error around "assert" is likely an internal error of some kind. 
Assertions are pieces of code placed in software to detect invalid 
arguments to methods, internal data structure inconsistencies, and so on.

For example, consider the Ruby URI library. It doesn't support all kinds 
of URI. So, it would be a good idea if it were to assert that the URI it 
is being passed is one of the kinds it actually knows how to parse. That 
way, someone innocently using the library with the wrong kind of URI 
will discover the problem immediately, rather than being passed back bad 
data, or having some bizarre error occur in the middle of the library code.

So it could be that you're passing an invalid argument to a method of 
htmltokenizer. It's also possible that you're triggering a bug in the 
library.

> 2.  What do you mean by "slurp" in the rest of the text?

"slurp" meaning "pull in the entire content of the file from the current 
file pointer onwards, without performing any processing on it".

As in file = File.new("something.gif")
       data = file.read # slurp!

<URL:http://www.retrologic.com/jargon/S/slurp.html>

> 3.  Any better examples how to use htmltokenizer?

require 'html/htmltokenizer'

#[...]

     # Parse all the images and links out of the web page
     tokenizer = HTMLTokenizer.new(@body)
     @images = Array.new
     @links = Array.new
     lastlink = ''
     while tag = tokenizer.getTag('img', 'a')
       if tag.tag_name == 'img'
         url = tag.attr_hash['src']
         uri = @uri.merge(url)
         @images.push([uri.to_s, lastlink])
       else
         url = tag.attr_hash['href']
         uri = @uri.merge(url)
         @links.push(uri.to_s)
         lastlink = uri.to_s
       end
     end

That's the only time I've used it, I'm afraid. Still, it might give you 
some ideas.


mathew