On Apr 15, 2009, at 8:19 AM, Adam Akhtar wrote:

> i have a text file which has entries comprised of a key written in
> binary and its values written in strings (you can see an exerpt  
> below).
>
> I need to parse the binary and transform it into human readable hex  > and
> parse its associated info. My reg exps dont seem to be behaving and im
> wondering if its me or if its this binary text that is causing  
> mischief
> somehow. Heres a sample item
>
> 20: 
祺ア・・キ・G聊ま
> d8:completei7e10:downloadedi2046e10:incompletei1ee > > > binary parts are always enclosed between "20:" and "d8:complete" where > the 8 can be any integer(s) e.g. 5 or 23. > > str = File.open('textfile.txt' , 'r').readlines.join > str.gsub!(/(20:)(.*?)(d\d+:)/m) do |x| > $1 + $2.unpack('H*').join + $3 > end > > The above works for some but not all of the text. It seeems to go > beyond > the "d8:complete" marker I suspect this is an encoding issue. If your data is UTF-8, this code may work for you: data = File.read('textfile.txt') data.scan(/(20:)(.*?)(d\d+:)/um) do |start, bin, finish| p start + bin.unpack('H*').join + finish end I'm guessing though. If you want to read more about what I believe is causing you problems, you may find my m17n series of blog posts helpful: http://blog.grayproductions.net/articles/understanding_m17n James Edward Gray II