On Tue, 13 Feb 2001, Mathieu Bouchard wrote:

> This was a private message?
> 
Oops, I'll cross-post to ruby-talk. Missed that on the previous post...

> On Tue, 13 Feb 2001, Robert Feldt wrote:
> > On Tue, 13 Feb 2001, Mathieu Bouchard wrote:
> > > On Mon, 5 Feb 2001, Robert Feldt wrote:
> > > > I'm wrong) needs matz Ruby implementation to work)), ie. essentially a
> > > > Ruby compiler. I think this is doable in the long run but it'll not be
> > > > easy and will require tight integration with a VM/run-time.
> > > I prefer not to discuss about RubyVM at all for now, because I feel there
> > > are things that need to be done before and which will give us significant
> > > insight on how to do what you propose.
> > Ok, I'm more into the "lets-start-and-learn-as-we-go-along" school of
> > thought. ;-)
> 
> I'm in that school too, but I'm also in the "lets-start-by-the-beginning" 
> school of thought. "RubyVM" as I understood it is possibly the most
> central piece of Ruby. The Array and String classes, for example, are on
> an outer layer, and the interpreter itself in on an intermediate layer. 
> (This ordering is imprecise and debatable, but I hope the point gets
> through) 
> 
Yes, something like that but I also noticed matz use Array and Hash (well
actually its the low-level implementation of Hash st.c but anyway) in a
number of places in the interpreter so the division is not clearcut.

> > I agree that ArrayMixin is a step in the right direction but its on the
> > level of writing what I've called RubyBaseLib (builtin classes not
> > necessary for interpreter core) in Ruby.
> 
> In a Ruby-in-Ruby perspective, that's precisely why I'm doing those first: 
> they are outer layers, and therefore they are easier to rewrite, less code
> depends on them, and they are the big weight that make it quite effortful
> to, say, write a Ruby interpreter in Java (or in Scheme, Caml, etc).
> 
Ok, but IMHO you dont need very many of these outer classes to start
working on the core stuff. Basically String, Array and Hash from my
limited understanding of Matz Ruby interpreter. I don't see the need for
for example Bignum arithmetic in the core classes so for me it might be ok
to simply translate arithmetic directly to C ditto.

> > I don't see how it adresses the
> > interpreter or VM core itself.
> 
> I'm not talking about just the interpreter or just the VM. I'm talking
> about the whole. Those two you talk about are the innermost layers of the
> whole. 
>
> The whole is also called "libruby.a" and is the C code provided in the
> ruby distribution, except ext/*/*.c
>
I really don't think we disagree very much. I'm also interested in the
whole (to me this means having a full Ruby system with everything
(ok some platform-specific stuff will be needed, probably in
inlined C) written in Ruby). However, to get good performance I'm willing
to sacrifice some Ruby constructs/features as long as they don't force us
to write in a totally different language/paradigm.

Instead of arguing over the "right way" we should probably each work from
our preferred end. In the long term I think we'll benefit from each others
project. Hopefully we can continue communicating during design and impl so
that long-term benefit will be max for both projects. Fruitful discussion
will of course be important!

> > If its of any interest to you I've already started a project leading to a
> > Ruby syntax tree lib; basically its a general parser generator generating
> > parsers that give you a Ruby-friendly (Arrays can be used for nonterminal+
> > stuff etc) AST as a result.
> 
> This seems to be precisely what I wanted to have.
> 
> > I'll try to use it as a base for
> > Ruby--/sRuby to C compilation. If you're also working on a syntax tree lib
> > I think we should try to synch to avoid duplication of too much effort?!
> 
> I have barely started on it. It's just a sketch in my mind and a few
> notes, and I didn't intend to work on it until I'm done with ArrayMixin
> etc. That's why I say "I'd like to have" and not "I'm doing". Of course if
> you want to do it I may help you a little.
> 
I'll tell you what I've done so far on this parser gen:

* Specified a grammar for grammars (its similar to SableCC's format, but
even simpler with Ruby RegExp'sfor the tokens)
* Hand-coded this grammar
* Implemented the LALR(1)-gen algs in Dragon book
* A really simple lexer gen that simply applies the token regexps in
order until a match is found (This may not give good enough performance
but we can do a more traditional DFA-based gen later...)
 * Generated a grammar parser from the handcoded grammar

Next is extensive testing (Ok, in the name of pragmatism I wrote
some tests prior to implementation but sadly they dont cover
everything... ;-). This will give a generally usesful parser gen closer to
OO and Ruby than the existing ones (racc and rbison both based directly or
indirectly on flex and yacc). Next step is getting a Ruby parser by
converting parse.y to the grammar format of the parser gen.

> > I'm not sure I see the benefit of translating to something
> > St/Self-like?
> 
> Actually, a strict subset of Ruby syntax, in such a way that the Ruby
> parser class can inherit from the Ruby-- parser class.
> 
The Grammar class I've used supports this kind of "merging" of grammars
(and thus of parsers).

> I said more like St/Self because Ruby has a lot of control statements that
> can be recreated using already existing constructs, while St/Self don't.
> 
> Basically, Ruby-- would lack at least:
> * for, case..when
> * flip-flop operator
> * reversed form of if, unless, while, until
> * multiple assignment
> * yield (use & declarator and .call() instead)
> 
Why these? I guess yield and the flip-flop may not be straightforward to
translate to C but IMHO the other ones should be possible? I think
excluding yield would be a major hindrance since it is frequently used
in iterators. Not so for flip-flop.

My approach has been to write some low-level stuff in Ruby to see what
will be needed in sRuby/Ruby-- to not limit how we'd like to write these
things in Ruby. I've translated gc.c (not all of it but doing a OO design
based on the existing stuff) so far.

For example, in the GcAlgorithm subclass MriMarkAndSweepAlgorithm the mark
method uses an iterator over a RootSet to mark objects in the object
memory. I think it would not be natural for a Ruby developer to use
something different than an iterator for this => yield is needed in
RootSet implementation => yield is needed in sRuby/Ruby--. This is an
example of the kind of thinking I do to try to nail down what needs to be
in sRuby/Ruby--. Ok, if we really can't find a way to translate yields to
C code then we'll have to restrict sRuby/Ruby--.

> What I'm not interested in is adding strong typing etc to make a Ruby-like
> language that's not really Ruby at all. At least not for now. I want
> parsers and compilers for both Ruby and Ruby-- before even considering
> creating a variant of Ruby. Please don't misunderstand me.
> 
I'm not either interested in adding strong typing etc. I first thought
Squeak wasn't doing that but later realized they did (well not really,
they actually convert everything to unsigned longs which is not very
different from using VALUE everywhere, ie. close to how matz writes the
interpreter today) => their apporach
not as interesting anymore. The Scheme48 approach (links are on my RubyVM
page) is closer to what I'd like to acheive. The s in sRuby is not meant
to mean that static typing is added to the language we code in, rather
that its a subset of Ruby we can compile into statically typed code
(also s as in small, subset etc).

I'm also planning for a type of partial evaluation when translating
Ruby--/sRuby to C; ie. statements that is independent of run-time input
data can be ordinary/"full" Ruby code and is evaluated in "ordinary" Ruby
interpreter. The resulting objects are then translated to C directly in a
simpler/faster form (instance varsa can be made C globals etc). This
should be a good thing for objects in a Ruby interpreter such as
ObjectMemory/GarbageCollector etc.

For example, in the example with the RootSet above, the different types of
roots (ruby_dyna_vars, ruby_class, frame stack etc) are added during
partial evaluation and thus the yield used in the RootSet each method can
be translated to loops over the different types of root sets.

If you're interested I can send you a link to the latest version of my
thinking on sRuby/RubyVM...

> Send me a version of your parser when you're done.
> 
I'll do that but I've been working a bit on coding interpreter stuff 
in Ruby for a while so it might be a couple of weeks. BTW, do you have any
ideas on the "types" of the nodes in AST? Right now I simply have a
"name"/"type" attribute on the Symbols (Terminal and NonTerminal) and set
it to the name of the production parsed. SableCC and similar approaches
instead generates node classes for each unique type of
production. However, since Ruby is dynamically typed I didn't see any
benefit in that approach. Or do you think we should actually generate node
classes corresponding to the different productions/constructs in Ruby
syntax? I guess its the "right way" to do it...

> Remember that there are 2-3 implementations (ruby.y, irb, rb2c, I think)
> to consult when you need some precisions on the syntax. They may not be
> written the way we want to (which is why we want yet another parser), but
> they are still useful. :-) 
>
rb2c uses parse.y directly I think. Whats ruby.y and do you know what irb
uses?

Regards,

Robert