On Sat, Dec 18, 2010 at 1:00 PM, ara.t.howard <ara.t.howard / gmail.com> wrote:
> of course JRuby is a fantastic tool for many use cases, but i've personally
> found science to be perhaps the worst possible application of it. these
> reasons are quite simple:
>
> - speed. when you need something to be big or fast in science, generally
> even c won't cut it. fortran is still used in maybe 80% of big weather
> systems for a reason: the compilers are generally doing faster floating
> point ops than the equiv c compilers. one can bridge fortran -> c -> ruby
> quite easily (narray does this, gsl does this, etc) and it's place where
> JRuby actually makes the job much harder. Java, of course, isn't even in
> the ballpark.

If you need C or Fortran, you need C or Fortran. I won't argue that.
Most people, however, don't.

> - OS integration: the general approach to making ruby faster is to use
> parallelism. the best way is to run lot's of processes. JRuby's interface
> to the operating system level primitives for this (fork, et all) make this
> really really hard, close to impossible, to deal with simply. Mmap is
> another great example of something you want at your finger tips in
> science... Interfaces to hardware boards connected to a research device,
> etc. I think any research based science makes getter close to the metal a
> requirement.

The general approach to making Ruby faster is to use a faster Ruby or
write better Ruby code. JRuby's good for the former.

If you need to parallelize, processes are only one tool, and perhaps
the most blunt tool. In-process concurrency opens up many options that
are difficult or impossible with processes. So JRuby enables one set
of methodologies for concurrency while perhaps not supporting others
well. Trade-offs.

JRuby doesn't support fork, but it supports memory-mapping (via NIO
memory-mapping, and again you don't have to write or compile a line of
C). As for interfaces to hardware boards...if you need C, you need C.
I won't argue that. Most people don't.

> - start up time. related to the above is the fact that science tends to
> lead to many small programs running very often. map reduce jobs, cron jobs,
> process pipe lines of related algorithims, toolkits made extensible via file
> based processing, tons of processing of stdin/stdout tend to be facts of
> life when algorithm writers produce systems as a side effect. it's not
> pretty, but it is a fact i've seen repeated over and over.

This is how you do parallel processing for your work. It's not the
only way, and being able to pass whole in-memory object graphs over to
another thread is distinctly more elegant than having to marshal it
through a memory-mapped file or IO pipe.

> i am definitely aware of some projects which make really heavy use of java
> and there, JRuby sure would be an awesome tool but my personal experiences
> that anything related to the JVM is a total non-starter. YMMV.

Java is not a requirement for someone to want JRuby. All that's
required is wanting to avoid monkeying with native code, wanting a
really solid VM, and wanting to run concurrent threads in a robust
environment. You can do all that without ever touching a line of Java
code. Just because you don't do Java for science doesn't mean Java and
the JVM are bad options for science.

And in any case...it was based on my recommendations, after dealing
with and hearing from dozens of MRI users who have no end of problems
with native C extensions. With JRuby, you write it once, build it
once, and ship it. Perhaps it's not quite as fast as C, perhaps it
doesn't integrate with the OS as well...but it's a hell of a lot less
painful to use. Perhaps you can't fork, but you can use real
concurrent threads, which are almost certainly easier (provided you
don't share mutable data, as with processes). Perhaps it's not as
low-level and bare-metal as MRI, but it's a better experience for
many, many cases. And that's the Ruby way.

- Charlie