On Sun, 28 May 2006, Francis Cianfrocca wrote:

> If your workpile consists of long-running tasks (which it probably doesn't),

in fact, in my particular situation, it really does.  a task may task 1 hour
or 5 days.  so you have to put my comments in that context.  still i've found
rq or tuplespace scales well till about 30s jobs and, imho, if your jobs are
faster than that it's easier to bunch them in groups of 100 than to modify
your job distribution system...

> then you don't have to work too hard to get long-running processes.
> Otherwise you need an event-driven sytem to keep them busy (and pinned to
> their respective processors).

a good point.  this is precisely why we find using rq for our cluster to be so
applicable - the cost of a pure ruby solution is nothing compared to the
acutal work to be done.  if jobs start taking 0.5ms to run then that wouldn't
be the case at all to be sure.

> Shared memory: no. Don't do that. Use IPC or network communications. No,
> don't do that either. Use a proper event-passing library that wraps all of
> that up for you, so your remote-operation activations look like simple
> function calls. Remember, you'll want to run your multiprocesses on multiple
> machines before you know it. (Avoid distributed objects if possible, because
> for one thing they force you to couple client and server processes, and for
> another you really don't want the management hassles if your network is
> asynchronous.)

i'm unclear on exactly what you're advocating here.  how does remote event
driven programming couple your design any less that, say, a tuplespace of jobs
looked up via rinda/ring?  the same question applies to 'management hassles'
where a tuplespace makes it trivial to handle 'events' in an asynchronous
fashion that's very similar to a tradition event loop.

i'm just trying to learn more here about where even driven programming might
fit into my bag of tricks.  the big question i still have, and what seems like
a show stopper to me for my applications is:

   - data.  once you've received an event.  where's the data?  where is your
     config, your input, and where does your output go.  with a tuplespace you
     can use the exact same logic for all.  with rq this is all encoded in to
     job object.  please don't say use marshal because that just too crazy to
     even think about debugging...

   - point to point communication.  with rq or tuplespace the logic is to
     simply put a job 'out there' and that some node will 'take it'.  we don't
     care which node does so long as one does.  the lack of coupling between
     tasks and clients builds a very robust system since no client relies on
     any other.  take the example of 'broadcasting' a job: with rq or
     tuplespace you simple put it in the queue, with event driven programming
     you either hit every client with tcp or broadcast with udp and open
     yourself up for a flood of responses and the difficult programming task of
     coordinating atomic handshaking to grant access to one, and only one,
     client.  am i missing something obivous here or is this a tough thing to
     handle with an event driven paradigm?  how would you design a system where
     30 nodes pulled jobs from a central list as fast as they could with event
     driven programing?  (note that i'm specing a pull vs. push model to avoid
     any scheduling issues - all nodes bail water as fast as they can so
     scheduling it optimal for simple parallel tasks).

regards.

-a
-- 
be kind whenever possible... it is always possible.
- h.h. the 14th dali lama