On Tue, Apr 7, 2009 at 11:50 AM, Raimon Fs <coder / montx.com> wrote:
> hello,
>
> Given a sentence with some words, I want only some words but not all of
> them ...
>
>
> =A0REGISTRO DE LA PROPIEDAD DE ALBACETE X EMISOR: MANUELA ADORACION
> CEBOLLA GARCIA Padre Romano, 12 2005 ALBACETE ALBACETE NIF: 44444444P
>
> In this case, I'm interested in the full name:
>
> MANUELA ADORACION CEBOLLA GARCIA
>
> I now that all names precede with the pattern EMISOR:
>
> And after the full name, the address is lowercase except the first char.
>
>
> With this patter I can find all the uppercase words: \w*[A-Z]{2}\b
>
> But I'm only interested in the full name, so if I use: EMISOR:
> \w*[A-Z]{2}\b
>
> I only get the first name MANUELA
>
> How I can get from there to the end of the name ?
>
> Any help ?


With 1.9's Oniguruma (is it available for 1.8?) it's quite easy

   scan( /EMISOR:\s*((?:[A-Z]+\s*)+)(?=3D[A-Z][a-z])/ ).flatten

you might want to use Unicode strings though and POSIX Character Classes

   /EMISOR:\s*((?:[:upper:].... [:upper:][:lower:])/

HTH
Robert

P.S.
If you need a 1.8 version tell me I will switch to 1.8 when I find some tim=
e.



>
> thanks ...
>
> r.
> --
> Posted via http://www.ruby-forum.com/.
>
>



--=20
There are some people who begin the Zoo at the beginning, called
WAYIN, and walk as quickly as they can past every cage until they get
to the one called WAYOUT, but the nicest people go straight to the
animal they love the most, and stay there. ~ A.A. Milne (from
Winnie-the-Pooh)