On Tue, May 22, 2007 at 03:20:04PM +0900, Hans Fugal wrote:
> I would like to identify partial matching of a regular expression, for a 
> stream of input, as described in the pcrepartial(3) manpage. Is this 
> possible with ruby Regexp, or would I have to wrap (a piece of) pcre? 
> (or implement my own regular expression engine, hah!)

It looks like someone has wrapped pcre already:
http://raa.ruby-lang.org/project/pcre/
but that's quite old so you might need to fiddle with it a bit.

> As an aside, what I am really trying to do is write a lexer that works 
> on stream input, and can decide whether any of the eligible tokens match 
> before reading EOF (which may be a long, long way off both in bytes and 
> time). If you can think of another approach (that still uses regexes) 
> that'd work too.

Well, you can use regexps to distinguish a complete token from a partial
one, simply by checking if it is followed by a character which is not part
of the token. A little care is needed to handle EOF correctly - at worst you
could just stick a sentinel character onto the end.

A simple example, which matches (\w+) and (\s+) as tokens:

require 'stringio'
stream = StringIO.new("wibble  bibble  boing")

token = ""
chunk = stream.read(1)
token << chunk if chunk
loop do
  case token
  when /\A\w+/
    match = $&
  when /\A\s+/
    match = $&
  else
    puts "Syntax error here! " + token.inspect
    break
  end

  if match.size < token.size or chunk.nil?
    puts "Match token: " + token.slice!(0,match.size).inspect
    break if chunk.nil?
  else
    #puts "Partial match: " + token.inspect
    chunk = stream.read(1)
    token << chunk if chunk
  end
end

This should also work if you use, say, read(4096) instead of read(1), so it
ought to be pretty efficient.

Regards,

Brian.