------ art_6082_11694773.1130634239195 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On 10/30/05, James Edward Gray II <james / grayproductions.net> wrote: > > We're having a discussion on Ruby Core about how to speed up CSV. > I'm trying to tune a Regexp that matches CSV fields. However, I'm > seeing something I don't expect. Can someone explain this to me, > please? > > >> ",".scan(/(?:^|,)(?:"()"|([^",]*))/) > => [[nil, ""]] > > That's a simplified version of what I'm messing with. My question > is, why does it only match once, when I expect two matches? > > The first match should be right at the beginning, and is basically > (?:^ ... )(?: ... ([^",]*)). The second match should begin at the > comma, being (?: ... ,)(?: ... ([^",]*)). What am I missing? > I'm not pretending to be a regexp guru, but nonetheless: scan moves forward one character even if the portion of the string that it matched has length 0. This is to prevent it from going into an infinite loop. Consider your example: the regexp matches at the start of the string, and matches 0 characters. If for the next match, Ruby has not moved forward one character, the regexp would match at the start of the string again in exactly the same way and still have not matched anything of the string. My suggestion would be to have two regexps, one to strip off the beginning of the CSV line, and one to split the remainder into parts. Peter ------ art_6082_11694773.1130634239195--