On 10/23/2009 11:28 PM, Rob Doug wrote:
> Well, seems I found solution...
> I tried to make some test on python as well. Simple script, previously 
> posted, eat memory on python too... and the only way I had it to use 
> forks. I checked out forkoff, but produce some strange bugs. This is the 
> working code:
> 
> threads = (1..THREADS).map do
>    Thread.new q do |qq|
>      until qq.equal?(myLink = qq.deq)
>        mutex.synchronize do
>          puts ($n +=1).to_s # + " : " + print_class_counts.to_s
>        end
>        fork   # <----- You need to fork it, after exit fork will release 
> memory
>          begin
>            agent = WWW::Mechanize.new{ |agent|
>            agent.history.max_size=1
>            agent.open_timeout = 20
>            agent.read_timeout = 40
>            agent.user_agent_alias = 'Windows IE 7'
>            agent.keep_alive = false
>            }
>            page = agent.get(myLink)
>            puts myLink
>            puts page.forms.length
> 
>            page.forms.each do |form|
>            end
>          rescue
>          end
>        end
>      end
>    end
> end

You create threads and fork a process for every single item to process. 
  This has some consequences:

- your threads will eat all the entries in the queue very quickly
- you will get a large number of processes immediately

In this setup you do neither need threads nor a queue.  Basically you 
just need to iterate the input list and fork off a process for every 
item you meet.  However, then you do not have any control over 
concurrency and your CPU will suffer.  With the setup you presented you 
should at least have threads wait for their processes to return so a 
single thread does not fork off more than one process at a time.

Kind regards

	robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/