James Britt wrote:
> What's the criteria for having Ruby include something written in C?
> 
> For example, I've read complaints concerning Ruby's speed in processing
> large XML files.   REXML is pure Ruby, and the speed just can't match a
> C-based parser.
> 
> So what if REXML, or just parts of it, were re-written in C?  Fair game?
> 
> Or simply include libxml or expat in the core Ruby distribution, and
> include a Ruby binding?
> 
> Not advocating, just using XML parsing as an example.  Partly though
> because, by comparison, Ruby ships with a YAML parser written in C.  So,
> in principle, I would imagine that a C-based XML parser would be at
> least eligible for consideration.  In general, though, what are the
> criteria for such consideration?
> 
> Some issues I can think of for deciding to include C code:
> 
>  * License: Code would need to be compatible with/equivalent to
>    Ruby's license
>  * Flexibility/Access: Pure-Ruby libs are available for metaprogramming;
>    I can, in the REXML example, dynamically munge the workings of the
>    parser, something that might vanish were parts replaced with C
>  * Compilation: Adding more C ups the chance that someone, somewhere,
>    will not be able to build Ruby on some platform.  Or move it to Rite.
>  * Maintenance/Ownership: Does it make sense to ship a library, such as
>    expat, that is maintained outside of the Ruby core?
>    If code is added to the core, does it make Ruby harder/easier to
>    maintain?

These are good criteria. Maybe another would be:

   * Application logic remains in ruby code.

The compilation/porting criterion is a bit easier now that MSVC is 
freely downloadable, but it's still an issue and I am pessimistic enough 
to think that it probably always will be.

Nevertheless, I do think Ruby with a limited amount of C is fair game 
when choosing Ruby examples that are intended to show, realistically, 
whether Ruby is computationally adequate for some task. The Ruby API is 
a feature that should be considered by anyone shopping for languages.

IMHO, we should especially promote examples involving code generation,
which can succeed on the flexibility and maintenance points. But I'm not 
sure there are any problems on the shootout site that are suited to code 
generation. The problems for which code generation is useful tend to be 
more complex (or complex in a different way) than sorting, hashing, 
counting, etc., which are better solved with a fixed library.

Here's an example for which code generation is essential. In my work, 
there are libraries* that allow a ruby program to express, in standard 
ruby syntax, specifications for a network of hybrid automata. Hybrid 
automata are essentially state machines with ordinary differential 
equations in the states and guard expressions on the transitions. The 
library takes these specs and generates/compiles/loads/runs C code for 
solving the ODE's and evaluating the guard predicates. Doing this is 
complicated by dynamic reconfiguration of the network: formulas involve 
not just variables like 'x' but indirect references to 'obj.x', where 
obj may change from timestep to timestep depending on discrete 
transitions. This behavior would make it difficult to use a fixed math 
lib efficiently--the C code must, for optimal speed, depend on the 
user's specifications.

Performance is comparable (though a bit slower because of the indirect 
references) to solvers like Matlab, which can't (when I last checked) 
even handle dynamic reconfiguration. Yet the programmer using these 
libraries doesn't even need to know that there is a compiler 
involved--you just run ruby scripts that include the libraries and 
define certain structures.

My point isn't that this example would make sense on the "shootout" 
site, but that writing code generators is a realisitic approach with 
Ruby, because of Ruby's:

   - C API and mkmf.rb

   - string processing

   - ease of working with complex object models

With a little work, you can get the performance of C without losing 
metaprogramming and other dynamic aspects of ruby code (for example, 
debugging the hybrid automata using irb). And you are still expressing 
your application logic entirely in ruby code.

--
* Cgen and RedShift, which are themselves pure ruby. Cgen is on RAA, but 
I haven't released RedShift yet.