On Mon, 3 May 2004, Sarah Tanembaum wrote:

> Pointers, please...
> 
> I have this text in a comma delimited file with the following
> characteristic:
> 
> ccc-123456, <multiline data>,
> 
> Field number:
> 
> 1a - its always begin with 1 to 3 characters followed by
>       a dash, e.g JKL-, A-, NM-, PQ-
> 
> 1b - after the dash, it follows by numbers starting from
>      1 to 99999
> 
> 2 -  a multiline data with either or both newline chars(\n)
>        and/or cariage-return char(\r), or both(\r\n). This field
>        might include special characters such as a
>        single(') or double(") quote, a space, characters
>        with ascii number > 127 - accented character,
>        umlaud, etc ...
> 
> 3 -  this field contain at least 2 line to at most 5 line of
>        data where each line might be
>        Begin with 2-3 chars, e.g GH@OPRJGPF1234
>        followed by an "@", 1-7chars, and followed by
>        1-4 numbers
> 
> My question is :
> 
> 1a. how to parse the first field(field 1a) so I can manipulate/rename it to
> a new label dending on what label they have currently

what exactly do you mean by this?  if you want to parse the fields themselves
out use the 'csv' module included with ruby...

> 1b. in field 1b, instead of just 1 number, I'd like to pad
> them with leading zero so, 1 -> 000001,
> 1494 -> 001494, 560987->560987(no change).

  ~ > ruby -e 'p(sprintf("%06.6d", 42))'
  "000042"

  ~ > man 3 printf

> 2. capture 2nd field and escape the special characters with ascii number

    esc = '\\'[0]
    munged = ''
    field_2.each_byte{|c| munged << esc if c > 127; munged << c} 
    field_2 = munged

  you could also use a regex to do this...

    special = %r/([#{ 127.chr }-#{ 255.chr })]/o
    field_2.gsub!(special){|match| "\\#{ match }"}

> 
> 3. capture 3rd field and parse them as well just as field 1.
> 
> THanks


can you post some sample data?  we could probably say more then...


-a
-- 
===============================================================================
| EMAIL   :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE   :: 303.497.6469
| ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
| URL     :: http://www.ngdc.noaa.gov/stp/
| TRY     :: for l in ruby perl;do $l -e "print \"\x3a\x2d\x29\x0a\"";done 
===============================================================================