Hi,

I have yet another question about how to write a specific text parser in
ruby...
So, without further ado - this is what the source file looks like:

Query= gi|23510597|emb|CAD48982.1| ring-infected erythrocyte surface
antigen precursor [Plasmodium falciparum 3D7]
         (1085 letters)

Database: KOG
           112,920 sequences; 47,500,486 total letters

Searching..................................................done



                                                                 Score
E
Sequences producing significant alignments:                      (bits)
Value

At2g21510                                                          96
3e-19
At4g39150                                                          95
1e-18
At1g76700

and so on...

What I want to do is the following:
Read the source file - and if a line starts with "Query=", strip
everything from the line but the expression "gi|xxxxx". That part was no
problem with gsub, mind you. But, now the tricky thing (or not, I
guess...).
Go from there until you find a line starting with "Sequence", skip this
line and the following and puts the third line together with the
"gi|xxxxx"
So from the above example it would look like this:

gi|23510597 At2g21510

No, ideally I wouldnt have to include this skip-lines part, but I cant
find a regexp, that lets me reliably identify the first line of the
results block (not all possible results start with At...).

How I tried to do it:

def stripname line
  s = line.gsub(/Query=/, '')
  u = s.gsub(/\|emb.*/, '')
end

count = 0 # initializing variables
t = nil
v = nil

ARGF.each do |l|

  puts l unless count.zero?
  count = [0, count-1].max

  if l.match(/^Query=/)
    t = stripname l
  elsif l.match(/^Sequences/)
    l = $1
    count = 2
    puts "#{t}#{l}"
  else
  end
end

But the output looks terrible:
gi|23510597

At2g21510
96   3e-19
 gi|23510599

At5g14980
58   3e-08
 gi|23510600

And no matter what I try, I cant get the gi|xxxx and the corresponding
"best hit" in the same line. Tried it with hashes, but frankly dont know
enough about those yet.
So If anyone has a helpful comment or solution, I would be extremely
grateful!

Cheers,

Marc

-- 
Posted via http://www.ruby-forum.com/.