On Apr 24, 2009, at 5:15 PM, Phlip wrote:

> s.ross wrote:
>
>> Assuming we can't improve the request/response rate of the Web  
>> service  calls or the granularity of the return data, is there a  
>> way to  implement some parallelism?
>
> Run the entire downloader from a cron task.
>
> Your question ass-umes that you must run out of one controller  
> action. Wrong mindset!

I hope that's not what my question ass-umes. I am able to get the  
master records in chunks of 20-100. And they parse just fine. The hope  
was to make the detail retrieval of these records happen in parallel  
with fetching the next batch -- which I have successfully done.

> And BTW 10,000 XML records should be trivial, so you might look for  
> a bottleneck there. I would not read them all as a huge Ruby string  
> and then convert them into a huge DOM model in memory. That would  
> thrash. I would use what I think is called the "SAX" model of  
> reading, where you register a callback for each node type, then let  
> your reader stream them in...

Using DOM callbacks is just fine in the event you have a poorly  
bounded rowset count. I have a pretty well-bounded count and parsing  
the chunked data makes it quite manageable without callbacks.

I had considered the cron task but that's one step ahead of where I am  
right now. I'm running them from the console to determine the  
acceptability of how the thing is architected. As I noted in a  
followup post to the list, I discovered that using XmlRpc::Client#call  
can expose some potential data corruption in a multi-threaded  
implementation. However, XmlRpc::Client#async_call does not have that  
same problem, and by shifting the detail record fetch process into  
threads that begin after each chunk of master records are read, I  
increased the effective processing efficiency by around 2.5x because  
while the next master Web service fetch was blocking on the response,  
all the little detail fetches were purring right along in their own  
threads.

Thx,

Steve