On Sat, 21 May 2005, Luke Kanies wrote: > I'm in the process of writing a kind of distributed application, where one > or more central servers does some initial processing of a set of files, and > a bunch of clients then connect and get an appropriate subset of the > processed information. In addition, each of the clients needs to be > queryable, so I can always figure out their status and get metrics and such. > > Obviously there are many ways to do this, but given the industry I'm > targeting with this and the applications with which I expect to need to > integrate, it seems like some kind of semi-standardized web service makes > the most sense. > > So, using some examples online, I hacked up a quick webrick/soap4r server on > both my client and server, and I'm successfully passing information around. > > Well, kind of. The problem is that webrick seems to require that my process > be entirely reactive -- both my client and server want to sit there waiting > for someone to connect, when obviously that won't work. I need to get > separate actions going on each process, but webrick seems to want to require > that all action is entirely reactive. So, I'm now in the situation where > the server works entirely reactively, and the client can contact it fine > before I start the client's webrick server, but after the server starts I > lose control of the process. > > What I'm really looking for is something like Perl's POE: Something that > allows me to set up multiple sub-processes, none of which are blocking, and > all of which run based on callbacks. On the server side, I want to respond > to requests, and periodically reprocess files as necessary (as they change > or whatever). On the client side, I want to periodically connect to the > server and get new data, and the data I have all has a period on which it is > reassessed -- e.g., every hour verify X is still true. The client needs to > also respond to requests for metrics and such when they come in. > > I've been considering setting up the server as a Rails server, although that > is certainly overkill at this point in the game and might be overkill in the > long term. I think that's too heavyweight for the client, though, and I'm > not sure I would get the features I want out of Rails anyway. > > Can anyone recommend anything I can use to get this kind of behaviour? Are > threads the only answer? (Please say they aren't.) if you are in *nix and have a central nfs filesystem all nodes can see check out rq (ruby queue) http://raa.ruby-lang.org/project/rq/ http://www.codeforpeople.com/lib/ruby/rq/ http://www.linuxjournal.com/article/7922 here's a snapshot of our system jib:~ > cfq status --- jobs: pending: 243 holding: 0 running: 36 finished: 501 dead: 0 total: 780 temporal: pending: earliest: { jid: 619, metric: submitted, time: 2005-05-12 11:31:42.919905 } latest: { jid: 1275, metric: submitted, time: 2005-05-20 14:20:15.163355 } shortest: longest: holding: earliest: latest: shortest: longest: running: earliest: { jid: 613, metric: started, time: 2005-05-19 19:46:09.532144 } latest: { jid: 1197, metric: started, time: 2005-05-20 15:26:14.373168 } shortest: { jid: 1197, duration: 00:01:1.258993 } longest: { jid: 613, duration: 19:41:41.339677 } finished: earliest: { jid: 781, metric: finished, time: 2005-05-12 13:35:31.757662 } latest: { jid: 723, metric: finished, time: 2005-05-20 15:26:13.962584 } shortest: { jid: 546, duration: 00:11:11.688514 } longest: { jid: 976, duration: 30:18:18.852480 } dead: earliest: latest: shortest: longest: performance: avg_time_per_job: 13:02:2.998790 n_jobs_in_last_1_hrs: 3 n_jobs_in_last_2_hrs: 6 n_jobs_in_last_4_hrs: 10 n_jobs_in_last_8_hrs: 23 n_jobs_in_last_16_hrs: 44 n_jobs_in_last_32_hrs: 91 exit_status: successes: 501 failures: 0 we've run about a half a million jobs through our system now with zero falures or bugs. if you nfs server/clients are setup right you can install in about 5 minutes without root privledges. basically the concept would be to have each client/server have a queue that it was putlling jobs from where all queues were located on a central nfs location. so every node can submit jobs to every other node and all nodes can run jobs. this is a servant architechture. so, for example, working on an nfs mount, on two nodes of mine - jib and carp - we can setup a queue for each node: jib:~/shared > rq `hostname`.q create --- q: /dmsp/moby-1-1/ahoward/shared/jib.ngdc.noaa.gov.q db: /dmsp/moby-1-1/ahoward/shared/jib.ngdc.noaa.gov.q/db schema: /dmsp/moby-1-1/ahoward/shared/jib.ngdc.noaa.gov.q/db.schema lock: /dmsp/moby-1-1/ahoward/shared/jib.ngdc.noaa.gov.q/lock carp:~/shared > rq `hostname`.q create --- q: /dmsp/moby-1-1/ahoward/shared/carp.ngdc.noaa.gov.q db: /dmsp/moby-1-1/ahoward/shared/carp.ngdc.noaa.gov.q/db schema: /dmsp/moby-1-1/ahoward/shared/carp.ngdc.noaa.gov.q/db.schema lock: /dmsp/moby-1-1/ahoward/shared/carp.ngdc.noaa.gov.q/lock so now each node has a queue located on a central nfs mount carp submits a job to jib: carp:~/shared > rq jib.ngdc.noaa.gov.q/ submit echo 42 --- - jid: 1 priority: 0 state: pending submitted: 2005-05-20 15:32:54.664324 started: finished: elapsed: submitter: carp.ngdc.noaa.gov runner: pid: exit_status: tag: restartable: command: echo 42 jib submits a job to carp: jib:~/shared > rq carp.ngdc.noaa.gov.q/ submit echo 42 --- - jid: 1 priority: 0 state: pending submitted: 2005-05-20 15:33:31.209160 started: finished: elapsed: submitter: jib.ngdc.noaa.gov runner: pid: exit_status: tag: restartable: command: echo 42 'feeders' (a process that takes jobs from the queue, runs them, and returns them to the queue) is started on each node. (normally these are daemons and be cron'd to be made 'immortal' - the restart if they die) carp:~/shared > rq carp.ngdc.noaa.gov.q/ feed --log=/dev/null 42 jib:~/shared > rq jib.ngdc.noaa.gov.q/ feed --log=/dev/null 42 so carp ran jib's job and jib ran carp's job. we can see this by: carp:~/shared > rq jib.ngdc.noaa.gov.q/ query jid=1 --- - jid: 1 priority: 0 state: finished submitted: 2005-05-20 15:32:54.664324 started: 2005-05-20 15:39:33.309159 finished: 2005-05-20 15:39:33.438110 elapsed: 0.128951 submitter: carp.ngdc.noaa.gov runner: jib.ngdc.noaa.gov pid: 26632 exit_status: 0 tag: restartable: command: echo 42 jib:~/shared > rq carp.ngdc.noaa.gov.q/ query jid=1 --- - jid: 1 priority: 0 state: finished submitted: 2005-05-20 15:33:31.209160 started: 2005-05-20 15:38:43.503715 finished: 2005-05-20 15:38:43.779134 elapsed: 0.275419 submitter: jib.ngdc.noaa.gov runner: carp.ngdc.noaa.gov pid: 20500 exit_status: 0 tag: restartable: command: echo 42 all the output is available as yaml and much of it can be input to other commands. in addition the queue is easily available directly via an api so it's pretty easy to code descision making based on some other node's queue contents. i also have a peice of software called 'dirwatch' (on raa too) that makes it trivial to setup 'watches' on directories to trigger actions when files are created, modified, deleted, etc. it's under revision as we speak and is undergoing major internal overhaul - but the basic funtionality an user interface won't change much. hth. -a -- =============================================================================== | email :: ara [dot] t [dot] howard [at] noaa [dot] gov | phone :: 303.497.6469 | My religion is very simple. My religion is kindness. | --Tenzin Gyatso ===============================================================================