Warning: many strong personal opinions and broad
generalizations to follow.

Someone asked a question about Ruby threads, and the
answers he recieved depressed me, so I thought I'd add my
own two cents.

I'm always amazed, working in an embedded/real-time
programming field as I do, at the aura of mystery and deep
magic that still seems to surround threads whenever they
are brought up in a scripting-language context.  Personally
I can't imagine working without threads for many types of
program.  Threads are practically required for any modern
non-trivial network programming.  Threads are, IMHO, right
above basic file I/O in terms of "concepts that every
programmer needs to have at least a basic knowledge of". 
They are not deep, arcane mysteries used only by gurus in
specialized fields.  They are basic tools of everyday
programming.

In essence, a multi-threaded program is one in which, at
least conceptually, several paths of execution are being
executed at once.  This doesn't (necessarily) have anything
to do with having multiple physical CPUs.  Modern operating
systems simulate concurrent processing using time slices. 
A well-designed operating system takes advantage of
multiple processors by distributing native threads evenly
across the processors; but this is a detail of
implementation that is completely hidden by the OS.

The unix fork() call does not create a thread.  It creates
a Process.  The difference is: a process gets a complete
copy of the environment it's parent process was executing
in.  This means that a copy is made of all the memory
associated with the parent process.  A thread, on the other
hand, executes in the *same* environment as it's parent and
sibling threads.  They all have access to the same
variables, the same data structures.  There are a couple of
results of this:  first, threads are *much* more efficient,
in both memory and processor usage, than processes.  A
native thread can be created very quickly, and the extra
memory overhead it incurs is minimal, just enough to hold
it's stack.  Modern operating systems like Linux and NT are
designed to juggle native threads very efficiently.  The
fact of shared memory also means that threads can work very
intimately with each other; the flip side of this is that
they can very easily get in each other's way and corrupt
shared memory.  These problems have spawned a whole family
of synchronization schemes which, when used properly, can
yield programs where dozens of threads work together in
perfect clockwork harmony.

POSIX threads, aka Pthreads, is a definition of a standard
threading *API*, not an implementation.  Many operating
systems implement Pthreads.  Having a Pthreads
implementation does not mean that the OS implements threads
in a certain way; rather, it means that any hacker coding
for the OS knows (for example) that they can call
spawn(&func), and it will start a thread whose entry point
is 'func'.  They also know that they have access to a
well-defined set of synchronization primitives such as
mutexes.

The meaning of "native threads" in a language is simple: it
means that the language allows the programmer to create
multiple threads which are then scheduled by the OS's
scheduler.  These native threads can take advantage of any
threading optimizations in the kernel.  But the most
important advantage is the preemptiveness of the kernel's
scheduler: there is no way for a single thread to hang up
the entire program, unless the programmer explicitly
arranges for it.  Perl provides an interface to native
threads, as does Python (although I've heard that Python's
implementation is fairly crude).  In Ruby, on the other
hand, when certain system calls are made, the whole program
halts until the call returns.  This can wreak havoc in
programs where certain threads are time-sensitive - for
instance, if a network thread needs to send a heartbeat
packet many times a second to keep a connection alive.   

It's possible to create programs in Ruby which work around
this factor; but there's a certain amount of
trial-and-error involved. To the Ruby beginner who has just
read the Pickaxe, there's no way of knowing that this will
happen until his program mysteriously fails.  Errors caused
by ignorance of Ruby's cooperative threading model fall
into the particularly nasty  "unexpected side-effect"
category of bugs.  ("Why are none of my socket threads
communicating?!! All I added was a thread that reads from a
config file! They have nothing to do with each other!")

All of this, plus the fact that informal testing shows Ruby
threads to be unbelievably slow, is why I'm overjoyed to
hear that native threads are in the works for 2.0.  Until
then, despite Ruby's being one of my all-time favorite
languages, I generally can't even consider it for any
project beyond simple utility scripts.

Related reading: I've heard that the book "Programming with
POSIX(R) Threads" by David Butenhof is the best
introduction to threading concepts in general, and the
POSIX API in specific.  I haven't read it myself.

- Avdi Grimm

P.S. Does anyone here have any theories on /why/ threads
are still percieved as such an arcane topic in the UNIX and
scripting language communities?

__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/