Cee Joe wrote in post #995830: > 7stud -- wrote in post #995821: >> I suggest that people never use irb because it has too many quirks. >> >> The first thing you need to realize is that '>' is >> not the separator you want to look for. That is the second bit of >> erroneous advice your mentor gave you. That's because you don't care >> what character marks the beginning of every entry, rather you care what >> character marks the end of every entry. The end of every entry in your >> file is marked by the string "\n\n", so you should use that as your >> input line terminator. Remember, ruby uses "\n" for the input line >> separator by default, which means that when you read a file using >> IO#each, ruby reads lines--where the end of a line is marked by a >> newline. > > I understand the logic, it makes sense. What if the file looked like > this, where there is one newline seperating the entries? : What if you had presented that possibility from the very beginning? require 'stringio' str =<<ENDOFSTRING >gi|329295464|ref|NM_2005745.3Acc1| Def1 zgc:65895 (zgc:65895), mRNA AGCTCGGGGGCTCTAGCGATTTAAGGAGCGATGCGATCGAGCTGACCGTCGCG >gi|456299107|ref|NM_2342343.3Acc2| Def2 zgc:65895 (zgc:65895), mRNA GTCGCTGGGTCGAAAAGTGGTGCTATATCGCGGCTCGCGTCGATGTCGCGATG CGTGCGCGCGAGAGCGCGCTATGATGAAAGGATGAGAGAG >gi|3542945647|ref|NM_7453343.5Acc3| Def3 zgc:65895 (zgc:65895), mRNA CGTGCGGGGABCCGTACGTGCCGTGGGGGTTT AATAGCGCGCCATCTGAGCAG TTAGTCGCTGACGCATGCACG ENDOFSTRING input = StringIO.new(str) buffer = '' input.each do |line| if line[0, 1] == '>' if buffer != '' puts buffer #or do something else to buffer puts '-' * 20 end buffer = '' buffer << line else buffer << line.sub(/ \n+ \z /xms, '') end end puts buffer #or do something else to buffer --output:-- >gi|329295464|ref|NM_2005745.3Acc1| Def1 zgc:65895 (zgc:65895), mRNA AGCTCGGGGGCTCTAGCGATTTAAGGAGCGATGCGATCGAGCTGACCGTCGCG -------------------- >gi|456299107|ref|NM_2342343.3Acc2| Def2 zgc:65895 (zgc:65895), mRNA GTCGCTGGGTCGAAAAGTGGTGCTATATCGCGGCTCGCGTCGATGTCGCGATGCGTGCGCGCGAGAGCGCGCTATGATGAAAGGATGAGAGAG -------------------- >gi|3542945647|ref|NM_7453343.5Acc3| Def3 zgc:65895 (zgc:65895), mRNA CGTGCGGGGABCCGTACGTGCCGTGGGGGTTTAATAGCGCGCCATCTGAGCAGTTAGTCGCTGACGCATGCACG -- Posted via http://www.ruby-forum.com/.