> I have this text in a comma delimited file with the following
> characteristic:
> 
> ccc-123456, <multiline data>,
> 
> Field number:
> 
> 1a - its always begin with 1 to 3 characters followed by
>       a dash, e.g JKL-, A-, NM-, PQ-
> 
> 1b - after the dash, it follows by numbers starting from
>      1 to 99999
> 
> 2 -  a multiline data with either or both newline chars(\n)
>        and/or cariage-return char(\r), or both(\r\n). This field
>        might include special characters such as a
>        single(') or double(") quote, a space, characters
>        with ascii number > 127 - accented character,
>        umlaud, etc ...
> 
> 3 -  this field contain at least 2 line to at most 5 line of
>        data where each line might be
>        Begin with 2-3 chars, e.g GH@OPRJGPF1234
>        followed by an "@", 1-7chars, and followed by
>        1-4 numbers
> 
> My question is :
> 
> 1a. how to parse the first field(field 1a) so I can manipulate/rename
it to
> a new label dending on what label they have currently
> 
> 1b. in field 1b, instead of just 1 number, I'd like to pad
> them with leading zero so, 1 -> 000001,
> 1494 -> 001494, 560987->560987(no change).
> 
> 2. capture 2nd field and escape the special characters with ascii
number
> 
> 3. capture 3rd field and parse them as well just as field 1.


Untested code:
=============

rex = %r|(\w{1,3})-(\d+),(.*),((\w{2,3}@\w{1,7}\d{1,4}){2,5})|

new_text = old_text.gsub(rex) {
  # rename label
  label = case $1
    when 'JKL' then 'newJKL'
    when 'AN'  then 'newAN'
  end

  # number padding
  num = sprintf("%03d", "1")

  # handle escaping for $3
  ...

  # parse field $4

  # return new construct:
  "#{label}-#{num},#{new_4_field}, #{new_4_field}"
}