"Weirich, James" <James.Weirich / FMR.COM> schrieb im Newsbeitrag
news:1C8557C418C561429998C1F8FBB283A728BA94 / MSGDALCLB2WIN.DMN1.FMR.COM...
> > From: Robert Klemme [mailto:bob.news / gmx.net]
> > > Try /./ instead of /.*/.  Unexpected stuff will come at you one
> > character at a time, which may or may not be ok.
> >
> > Looks like "|.*" is already present above.  Or did you want
> > to point to something else?
>
> Ummm ... I was suggesting "|." *instead* of "|.*"

Ooops, sorry.  Must've got the logic the other way round.

> irb(main):001:0> "foo bar baz".scan(/foo|bar|./)
> ["foo", " ", "bar", " ", "b", "a", "z"]
>
> See how the unexpected stuff comes at you one character at a time.  This
may
> or may not be a problem.
>
> > > For example, if all you are interested in are patterns
> > > like "some_var = some_thing_else", then end your token list with
> > > something like /[^a-zA-Z0-9_=]*/.
> >
> > Again I'd suggest to consider the "+" since the empty
> > sequence is not very often interesting as a token.
>
> Yes, "+" is a better option than "*".
>
> Another anoying feature of scan is this if you have parenthesis in your
RE,
> then it starts returning lists of matches.

Yeah, that's true.  It's usually not a problem since you control the regexp
(you can either use (?:...) or extract the appropriate element).  If you
don't control it you have to decide which of the groups you choose.  You'll
probably end up doing:

str.scan( rx ) {|m| m.kind_of?( String ) ? m : m.find{|x|x} }

But I wondered why it's not a MatchData instance like you get from
Regexp#match.  That would provide more information.  Any ideas about the
reasoning behind this?

Regards

    robert