------art_43175_7396553.1181478112576
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On 6/10/07, Robert Dober <robert.dober / gmail.com> wrote:
>
> On 6/10/07, Trochalakis Christos <yatiohi / ideopolis.gr> wrote:
> > Hello!
> >
> > I want to parse a tagged string like this: "<i>this is</i><i>my
> > string</i>"
> >
> > i am doing:
> >
> > >> "<i>this is</i><i>my string</i>".scan(/<i>(.*)<\/i>/)
> > [["this is</i><i>my string"]]
> >
> > What i want is a regex that will return the *first* segment that
> > matches.
> > in the above case -> [["this is", "my string"]]
> >
> > Is there any way to do this?
> >
> > Thanks!
> >
> >
> >
> This is a FAQ, and yes I will give the solution ;)
> Regexps are gready par default, they consume as many chars as
> possible, there are some possibilities - not tested:
>
> (1) use non gready matches
> "<i>this is</i><i>my string</i>".scan(/<i>(.*?)<\/i>/)
> (2) use less general expressions
> "<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*)<\/i>/)
> (3) Combine both ;)
> "<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*?)<\/i>/)


.Unless you want to match strings like <i><foo</i>, it would be simple to
just use [^<]*, and not .[^<]*. .[^<]* will also not match <i></i>. If the
intent was to make the regexp not match that, a better regexp would be [^<]+

HTH
> Robert
>
> P.S.
> This *really* is a FAQ though
> --
> You see things; and you say Why?
> But I dream things that never were; and I say Why not?
> -- George Bernard Shaw
>
>

------art_43175_7396553.1181478112576--