Francis Cianfrocca wrote:
> Sounds like you've already read all the books you need to read. You have the
> standard lingo down pat!

I'm afraid that must be a freak accident ;-)

> But here goes: you picked the wrong project to demonstrate parallel
> processing. Fast handling of network I/O is best done in an event-driven
> way, and not in parallel. The parallelism that this problem exhibits arises
> from the inherent nondeterminacy of having many independent clients
> operating simultaneously. This pattern does expose capturable intramachine
> latencies, but they're due to timing differentials, not to processing
> inter-dependencies.

Could you elaborate what you mean by "timing differentials" and 
"processing inter-dependencies"? For regular webapps, time spent 
querying the database most certainly exposes capturable intramachine 
latencies. Event-driven sounds good, but doesn't that requires that 
*all* I/O be non-blocking? If you have blocking I/O in, say, a 
third-party lib, you're toast.

> It's intuitively attractive to structure a network server as a set of
> parallel processes or threads, but it doesn't add anything in terms of
> performance or scalability. As regards multicore architectures, they add
> little to a network server because the size of the incoming network pipe
> typically dominates processor bandwidth in such applications.

You mean to say it's the network that is usually the bottleneck, not the 
CPU? Well, in my experience the database is usually the bottleneck, but 
let's not forget that ruby is particularly demanding on the CPU.

> You may rejoin: "but how about an HTTP server that does a massive amount of
> local processing to fulfill each request?" Now that's more interesting. Just
> get rid of the HTTP part and concentrate on how to parallelize the
> processing. That's a huge and well-studied problem in itself, and the net is
> full of good resources on it.

While optimizing for CPU speed is fine, I'm also interested in process 
isolation. If you have a monster lib that takes 1 minute to initialize 
and requires 1 GB of resident memory but is used only occasionally, do 
you really want to load it in all of your worker processes?