On Jan 8, 2005, at 7:21 PM, Bill Atkins wrote:

> Can you post the code?

Sure. The blogs variable is an array of the urls of blogs - I intend to 
eventually have these urls stored in MySQL, but for now an array works. 
I emptied that array so that those sites that I have in it aren't 
getting hit by too many people trying to help out. The threading is 
derived from a sample in "Programming Ruby." I'd love any additional 
feedback outside of dealing with the timeout issue.


#! /usr/local/bin/ruby -w

require 'open-uri'
require 'thread'

blogs = [ ]

buffer=Queue.new

# load the blogs into the queue
blogs.each do |blog|
   buffer.enq( blog )
end

consumers = (1..150).map do |i|
   Thread.new("consumer #{i}") do |name|
     begin
       blog = buffer.deq
       open( blog ) do |content|
         begin
           metas = content.read.scan( /<meta([^(>]*)>/m ).uniq
           metas.each do |current_meta|
             current_meta = current_meta.to_s

             if current_meta =~ /\s+name\s*=\s*[\"']([^\"']+)[\"']/
               name = $1
               current_meta =~ /\s+content\s*=\s*[\"']([^\"']+)[\"']/
               content = $1

               case name
               when "geo.position"
                 print "#{blog} \t #{content} \n"

               when "ICBM"
                 print "#{blog} \t #{content} \n"
               end
             end
           end
         rescue Exception
           p "#{blog}: $! \n"
         end
       end
     end until buffer == :END_OF_WORK
   end
end

begin
   consumers.size.times{ buffer.enq(:END_OF_WORK) }
   consumers.each{|th| th.join}
rescue Exception
   print $!
end




--
Jason N Perkins
<http://sneer.org/>