On Wed, May 23, 2007 at 01:00:04AM +0900, Hans Fugal wrote: > Well that works for \w+ an \s+, but what if you want to match /01+0/? > You'd get a syntax error on 0111 even though it's a valid partial match. OK, I see the problem - it's not detecting the end of the expression, it's saying that this expression *might* match but only if the right characters were appended to the end of the source. In the general case I think you'd have to turn each RE into one which matches all possible prefixes, perhaps something like /(0(1+(0)?)?)/ # (note *) However, if you can guarantee that no individual valid token is going to be longer than a certain size (let's say 200 characters) then it would be simpler to ensure that you read-ahead at least 200 characters into a buffer and then match against that. Alternatively: perhaps only a few of your token REs have unlimited variable length. Those you can code in the prefix form like that shown above. The remainder (of fixed or limited length) can just be matched in the simple way against a large enough read-ahead buffer. Regards, Brian. (*) Hmm, this isn't quite right, since it partially matches 011112 as well. You could check for a partial match (i.e. $3 = nil) and allow it only if it consumes the whole string. Alternatively, the RE itself needs to say "must be followed by X or end of string". This works, but it's a bit ugly: /(0(\z|1+(\z|0))) I can't think of a better formulation off the top of my head though.