On Sat, Apr 25, 2009 at 7:35 AM, s.ross <cwdinfo / gmail.com> wrote:

>
> On Apr 24, 2009, at 5:15 PM, Phlip wrote:
>
>  s.ross wrote:
>>
>>  Assuming we can't improve the request/response rate of the Web service
>>>  calls or the granularity of the return data, is there a way to  implement
>>> some parallelism?
>>>
>>
>> Run the entire downloader from a cron task.
>>
>> Your question ass-umes that you must run out of one controller action.
>> Wrong mindset!
>>
>
> I hope that's not what my question ass-umes. I am able to get the master
> records in chunks of 20-100. And they parse just fine. The hope was to make
> the detail retrieval of these records happen in parallel with fetching the
> next batch -- which I have successfully done.
>
>  And BTW 10,000 XML records should be trivial, so you might look for a
>> bottleneck there. I would not read them all as a huge Ruby string and then
>> convert them into a huge DOM model in memory. That would thrash. I would use
>> what I think is called the "SAX" model of reading, where you register a
>> callback for each node type, then let your reader stream them in...
>>
>
> Using DOM callbacks is just fine in the event you have a poorly bounded
> rowset count. I have a pretty well-bounded count and parsing the chunked
> data makes it quite manageable without callbacks.
>
> I had considered the cron task but that's one step ahead of where I am
> right now. I'm running them from the console to determine the acceptability
> of how the thing is architected. As I noted in a followup post to the list,
> I discovered that using XmlRpc::Client#call can expose some potential data
> corruption in a multi-threaded implementation. However,
> XmlRpc::Client#async_call does not have that same problem, and by shifting
> the detail record fetch process into threads that begin after each chunk of
> master records are read, I increased the effective processing efficiency by
> around 2.5x because while the next master Web service fetch was blocking on
> the response, all the little detail fetches were purring right along in
> their own threads.
>
>
or port XML-RPC so as it works from evented architecture such as
EventMachine or Packet (in which case you can use traditional workers for
concurrent download)