On Apr 7, 2:23 ¨Âí¬ Òáéíïî Æó ¼ãï®®®Àíïîôø®ãïí÷òïôåº > Mark Thomas wrote: > >> With 1.9's Oniguruma (is it available for 1.8?) it's quite easy > > > This shorter one works in 1.8 > > > scan(/EMISOR:\s*([\w\s]+?)(?=\s*[A-Z][a-z])/).flatten > > > I'm curious as to what Oniguruma-specific feature you used in yours. > > > -- Mark. > > thanks to all, at this moment I have enough with Ruby 1.8.7, so I'm with > this one, that works perfectly. > > Can you explain why this works ? > > :-) > > /EMISOR:\s*([\w\s]+?)(?=\s*[A-Z][a-z])/ > > EMISOR:\s is clear to me, but why it doesn't appear later in the array, > because it hasn't () ? > > The * is also clear > > ([\w\s]+?) means select all uppercase words/letters ? [\w\s] is a character class that matches "word characters" or spaces. The + makes it one or more. The ? means make it non-greedy (only match the minimum to make it true). > (?=\s*[A-Z][a-z]) until you reach a space between uppercase and > uppercase with lowercase later? the (?= ) is a lookahead assertion. It looks for a match ahead, without capturing it. So if you have any spaces, followed by an uppercase then lowercase letter, the previous match will stop matching. -- Mark.