On Sat, 21 May 2005, Luke Kanies wrote:

> I'm in the process of writing a kind of distributed application, where one
> or more central servers does some initial processing of a set of files, and
> a bunch of clients then connect and get an appropriate subset of the
> processed information.  In addition, each of the clients needs to be
> queryable, so I can always figure out their status and get metrics and such.
>
> Obviously there are many ways to do this, but given the industry I'm
> targeting with this and the applications with which I expect to need to
> integrate, it seems like some kind of semi-standardized web service makes
> the most sense.
>
> So, using some examples online, I hacked up a quick webrick/soap4r server on
> both my client and server, and I'm successfully passing information around.
>
> Well, kind of.  The problem is that webrick seems to require that my process
> be entirely reactive -- both my client and server want to sit there waiting
> for someone to connect, when obviously that won't work.  I need to get
> separate actions going on each process, but webrick seems to want to require
> that all action is entirely reactive.  So, I'm now in the situation where
> the server works entirely reactively, and the client can contact it fine
> before I start the client's webrick server, but after the server starts I
> lose control of the process.
>
> What I'm really looking for is something like Perl's POE:  Something that
> allows me to set up multiple sub-processes, none of which are blocking, and
> all of which run based on callbacks.  On the server side, I want to respond
> to requests, and periodically reprocess files as necessary (as they change
> or whatever).  On the client side, I want to periodically connect to the
> server and get new data, and the data I have all has a period on which it is
> reassessed -- e.g., every hour verify X is still true.  The client needs to
> also respond to requests for metrics and such when they come in.
>
> I've been considering setting up the server as a Rails server, although that
> is certainly overkill at this point in the game and might be overkill in the
> long term.  I think that's too heavyweight for the client, though, and I'm
> not sure I would get the features I want out of Rails anyway.
>
> Can anyone recommend anything I can use to get this kind of behaviour? Are
> threads the only answer? (Please say they aren't.)

if you are in *nix and have a central nfs filesystem all nodes can see check
out rq (ruby queue)

   http://raa.ruby-lang.org/project/rq/
   http://www.codeforpeople.com/lib/ruby/rq/
   http://www.linuxjournal.com/article/7922

here's a snapshot of our system

   jib:~ > cfq status
   ---
   jobs:
     pending: 243
     holding: 0
     running: 36
     finished: 501
     dead: 0
     total: 780
   temporal:
     pending:
       earliest: { jid: 619, metric: submitted, time: 2005-05-12 11:31:42.919905 }
       latest: { jid: 1275, metric: submitted, time: 2005-05-20 14:20:15.163355 }
       shortest:
       longest:
     holding:
       earliest:
       latest:
       shortest:
       longest:
     running:
       earliest: { jid: 613, metric: started, time: 2005-05-19 19:46:09.532144 }
       latest: { jid: 1197, metric: started, time: 2005-05-20 15:26:14.373168 }
       shortest: { jid: 1197, duration: 00:01:1.258993 }
       longest: { jid: 613, duration: 19:41:41.339677 }
     finished:
       earliest: { jid: 781, metric: finished, time: 2005-05-12 13:35:31.757662 }
       latest: { jid: 723, metric: finished, time: 2005-05-20 15:26:13.962584 }
       shortest: { jid: 546, duration: 00:11:11.688514 }
       longest: { jid: 976, duration: 30:18:18.852480 }
     dead:
       earliest:
       latest:
       shortest:
       longest:
   performance:
     avg_time_per_job: 13:02:2.998790
     n_jobs_in_last_1_hrs: 3
     n_jobs_in_last_2_hrs: 6
     n_jobs_in_last_4_hrs: 10
     n_jobs_in_last_8_hrs: 23
     n_jobs_in_last_16_hrs: 44
     n_jobs_in_last_32_hrs: 91
   exit_status:
     successes: 501
     failures: 0

we've run about a half a million jobs through our system now with zero falures
or bugs.  if you nfs server/clients are setup right you can install in about 5
minutes without root privledges.

basically the concept would be to have each client/server have a queue that it
was putlling jobs from where all queues were located on a central nfs location.
so every node can submit jobs to every other node and all nodes can run jobs.
this is a servant architechture.

so, for example, working on an nfs mount, on two nodes of mine - jib and carp -
we can setup a queue for each node:


   jib:~/shared > rq `hostname`.q create
   ---
   q: /dmsp/moby-1-1/ahoward/shared/jib.ngdc.noaa.gov.q
   db: /dmsp/moby-1-1/ahoward/shared/jib.ngdc.noaa.gov.q/db
   schema: /dmsp/moby-1-1/ahoward/shared/jib.ngdc.noaa.gov.q/db.schema
   lock: /dmsp/moby-1-1/ahoward/shared/jib.ngdc.noaa.gov.q/lock

   carp:~/shared > rq `hostname`.q create
   ---
   q: /dmsp/moby-1-1/ahoward/shared/carp.ngdc.noaa.gov.q
   db: /dmsp/moby-1-1/ahoward/shared/carp.ngdc.noaa.gov.q/db
   schema: /dmsp/moby-1-1/ahoward/shared/carp.ngdc.noaa.gov.q/db.schema
   lock: /dmsp/moby-1-1/ahoward/shared/carp.ngdc.noaa.gov.q/lock

so now each node has a queue located on a central nfs mount

carp submits a job to jib:

   carp:~/shared > rq jib.ngdc.noaa.gov.q/ submit echo 42
   ---
   -
    jid: 1
    priority: 0
    state: pending
    submitted: 2005-05-20 15:32:54.664324
    started:
    finished:
    elapsed:
    submitter: carp.ngdc.noaa.gov
    runner:
    pid:
    exit_status:
    tag:
    restartable:
    command: echo 42

jib submits a job to carp:

   jib:~/shared > rq carp.ngdc.noaa.gov.q/ submit echo 42
   ---
   -
    jid: 1
    priority: 0
    state: pending
    submitted: 2005-05-20 15:33:31.209160
    started:
    finished:
    elapsed:
    submitter: jib.ngdc.noaa.gov
    runner:
    pid:
    exit_status:
    tag:
    restartable:
    command: echo 42


'feeders' (a process that takes jobs from the queue, runs them, and returns
them to the queue) is started on each node.  (normally these are daemons and
be cron'd to be made 'immortal' - the restart if they die)

   carp:~/shared > rq carp.ngdc.noaa.gov.q/ feed --log=/dev/null
   42

   jib:~/shared > rq jib.ngdc.noaa.gov.q/ feed --log=/dev/null
   42

so carp ran jib's job and jib ran carp's job.  we can see this by:

   carp:~/shared > rq jib.ngdc.noaa.gov.q/ query jid=1
   ---
   -
    jid: 1
    priority: 0
    state: finished
    submitted: 2005-05-20 15:32:54.664324
    started: 2005-05-20 15:39:33.309159
    finished: 2005-05-20 15:39:33.438110
    elapsed: 0.128951
    submitter: carp.ngdc.noaa.gov
    runner: jib.ngdc.noaa.gov
    pid: 26632
    exit_status: 0
    tag:
    restartable:
    command: echo 42

   jib:~/shared > rq carp.ngdc.noaa.gov.q/ query jid=1
   ---
   -
    jid: 1
    priority: 0
    state: finished
    submitted: 2005-05-20 15:33:31.209160
    started: 2005-05-20 15:38:43.503715
    finished: 2005-05-20 15:38:43.779134
    elapsed: 0.275419
    submitter: jib.ngdc.noaa.gov
    runner: carp.ngdc.noaa.gov
    pid: 20500
    exit_status: 0
    tag:
    restartable:
    command: echo 42


all the output is available as yaml and much of it can be input to other
commands.  in addition the queue is easily available directly via an api so
it's pretty easy to code descision making based on some other node's queue
contents.

i also have a peice of software called 'dirwatch' (on raa too) that makes it
trivial to setup 'watches' on directories to trigger actions when files are
created, modified, deleted, etc.  it's under revision as we speak and is
undergoing major internal overhaul - but the basic funtionality an user
interface won't change much.

hth.

-a
-- 
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple.  My religion is kindness.
| --Tenzin Gyatso
===============================================================================