On Apr 23, 2009, at 15:51, Adam Akhtar wrote: > ahh should have thought about that. here is a souce file > > Attachments: > http://www.ruby-forum.com/attachment/3615/mini-scrape.txt I think regexp is the wrong way to do this. Since this is a binary file format a regexp is unlikely to give you real data. Scanning seems to work out better. Where did you get this data? It seems to have the following format in pseudo EBNF: record: digit+ ":" <N bytes of data> stuff stuff: "d" | "i" N+ "e" "e"? Instead of using Regexp, use StringScanner or just read by hand like I do below. Here's what I tried: irb(main):001:0> io = open 'mini-scrape.txt' => #<File:mini-scrape.txt> irb(main):002:0> io.read 1 => "2" irb(main):003:0> io.read 1 => "0" irb(main):004:0> io.read 1 => ":" # I'm guessing "20:" says read 20 bytes, let's see where that puts us: irb(main):005:0> io.read 20 => " \f\373j\342Q\261\201E\201E\267\201EG\e\343\326\202\334" # ok... irb(main):006:0> io.read 1 => "d" # I don't know what "d" means, but carrying on: irb(main):007:0> io.read 1 => "8" irb(main):008:0> io.read 1 => ":" # "8:", let's read 8 bytes: irb(main):009:0> io.read 8 => "complete" # ok, looking good irb(main):010:0> io.read 1 => "i" irb(main):011:0> io.read 1 => "9" irb(main):012:0> io.read 1 => "e" # dunno what "i9e" could be irb(main):013:0> io.read 1 => "1" irb(main):014:0> io.read 1 => "0" irb(main):015:0> io.read 1 => ":" # "10:", read 10 bytes: irb(main):016:0> io.read 10 => "downloaded" # ok... irb(main):017:0> io.read 1 => "i" irb(main):018:0> io.read 1 => "2" irb(main):019:0> io.read 1 => "0" irb(main):020:0> io.read 1 => "6" irb(main):021:0> io.read 1 => "4" irb(main):022:0> io.read 1 => "e" # dunno what "i2064e", but maybe it downloaded 2064 bytes and the previous one was complete in 9 somethings irb(main):023:0> io.read 1 => "1" irb(main):024:0> io.read 1 => "0" irb(main):025:0> io.read 1 => ":" # read 10 bytes, another string: irb(main):026:0> io.read 10 => "incomplete"