On Tue, May 22, 2007 at 03:20:04PM +0900, Hans Fugal wrote: > I would like to identify partial matching of a regular expression, for a > stream of input, as described in the pcrepartial(3) manpage. Is this > possible with ruby Regexp, or would I have to wrap (a piece of) pcre? > (or implement my own regular expression engine, hah!) It looks like someone has wrapped pcre already: http://raa.ruby-lang.org/project/pcre/ but that's quite old so you might need to fiddle with it a bit. > As an aside, what I am really trying to do is write a lexer that works > on stream input, and can decide whether any of the eligible tokens match > before reading EOF (which may be a long, long way off both in bytes and > time). If you can think of another approach (that still uses regexes) > that'd work too. Well, you can use regexps to distinguish a complete token from a partial one, simply by checking if it is followed by a character which is not part of the token. A little care is needed to handle EOF correctly - at worst you could just stick a sentinel character onto the end. A simple example, which matches (\w+) and (\s+) as tokens: require 'stringio' stream = StringIO.new("wibble bibble boing") token = "" chunk = stream.read(1) token << chunk if chunk loop do case token when /\A\w+/ match = $& when /\A\s+/ match = $& else puts "Syntax error here! " + token.inspect break end if match.size < token.size or chunk.nil? puts "Match token: " + token.slice!(0,match.size).inspect break if chunk.nil? else #puts "Partial match: " + token.inspect chunk = stream.read(1) token << chunk if chunk end end This should also work if you use, say, read(4096) instead of read(1), so it ought to be pretty efficient. Regards, Brian.