Cee Joe wrote in post #995830:
> 7stud -- wrote in post #995821:
>> I suggest that people never use irb because it has too many quirks.
>>
>> The first thing you need to realize is that '>' is
>> not the separator you want to look for.  That is the second bit of
>> erroneous advice your mentor gave you.  That's because you don't care
>> what character marks the beginning of every entry, rather you care what
>> character marks the end of every entry.  The end of every entry in your
>> file is marked by the string "\n\n", so you should use that as your
>> input line terminator.  Remember, ruby uses "\n" for the input line
>> separator by default, which means that when you read a file using
>> IO#each, ruby reads lines--where the end of a line is marked by a
>> newline.
>
> I understand the logic, it makes sense. What if the file looked like
> this, where there is one newline seperating the entries? :

What if you had presented that possibility from the very beginning?


require 'stringio'

str =<<ENDOFSTRING
>gi|329295464|ref|NM_2005745.3Acc1| Def1 zgc:65895 (zgc:65895), mRNA
AGCTCGGGGGCTCTAGCGATTTAAGGAGCGATGCGATCGAGCTGACCGTCGCG

>gi|456299107|ref|NM_2342343.3Acc2| Def2 zgc:65895 (zgc:65895), mRNA
GTCGCTGGGTCGAAAAGTGGTGCTATATCGCGGCTCGCGTCGATGTCGCGATG
CGTGCGCGCGAGAGCGCGCTATGATGAAAGGATGAGAGAG

>gi|3542945647|ref|NM_7453343.5Acc3| Def3 zgc:65895 (zgc:65895), mRNA
CGTGCGGGGABCCGTACGTGCCGTGGGGGTTT
AATAGCGCGCCATCTGAGCAG
TTAGTCGCTGACGCATGCACG

ENDOFSTRING

input = StringIO.new(str)
buffer = ''

input.each do |line|
  if line[0, 1] == '>'
    if buffer != ''
      puts buffer  #or do something else to buffer
      puts '-' * 20
    end

    buffer = ''
    buffer << line
  else
    buffer << line.sub(/ \n+ \z /xms, '')
  end

end

puts buffer   #or do something else to buffer

--output:--
>gi|329295464|ref|NM_2005745.3Acc1| Def1 zgc:65895 (zgc:65895), mRNA
AGCTCGGGGGCTCTAGCGATTTAAGGAGCGATGCGATCGAGCTGACCGTCGCG
--------------------
>gi|456299107|ref|NM_2342343.3Acc2| Def2 zgc:65895 (zgc:65895), mRNA
GTCGCTGGGTCGAAAAGTGGTGCTATATCGCGGCTCGCGTCGATGTCGCGATGCGTGCGCGCGAGAGCGCGCTATGATGAAAGGATGAGAGAG
--------------------
>gi|3542945647|ref|NM_7453343.5Acc3| Def3 zgc:65895 (zgc:65895), mRNA
CGTGCGGGGABCCGTACGTGCCGTGGGGGTTTAATAGCGCGCCATCTGAGCAGTTAGTCGCTGACGCATGCACG

-- 
Posted via http://www.ruby-forum.com/.