Bucco wrote: > 1. I tried the example for the htmltokenizer and got an error around > assert. Where/what is the assert method? An error around "assert" is likely an internal error of some kind. Assertions are pieces of code placed in software to detect invalid arguments to methods, internal data structure inconsistencies, and so on. For example, consider the Ruby URI library. It doesn't support all kinds of URI. So, it would be a good idea if it were to assert that the URI it is being passed is one of the kinds it actually knows how to parse. That way, someone innocently using the library with the wrong kind of URI will discover the problem immediately, rather than being passed back bad data, or having some bizarre error occur in the middle of the library code. So it could be that you're passing an invalid argument to a method of htmltokenizer. It's also possible that you're triggering a bug in the library. > 2. What do you mean by "slurp" in the rest of the text? "slurp" meaning "pull in the entire content of the file from the current file pointer onwards, without performing any processing on it". As in file = File.new("something.gif") data = file.read # slurp! <URL:http://www.retrologic.com/jargon/S/slurp.html> > 3. Any better examples how to use htmltokenizer? require 'html/htmltokenizer' #[...] # Parse all the images and links out of the web page tokenizer = HTMLTokenizer.new(@body) @images = Array.new @links = Array.new lastlink = '' while tag = tokenizer.getTag('img', 'a') if tag.tag_name == 'img' url = tag.attr_hash['src'] uri = @uri.merge(url) @images.push([uri.to_s, lastlink]) else url = tag.attr_hash['href'] uri = @uri.merge(url) @links.push(uri.to_s) lastlink = uri.to_s end end That's the only time I've used it, I'm afraid. Still, it might give you some ideas. mathew