well ive found some stuff out re: binary format.

I was getting confused re: the "b" switch in File.open("file", "rb") (as 
in "r**b**")

I thought this was needed to tell ruby we were dealing with some funky 
"binary" file but its a lot simpler than that. There is no special 
binary file format (that im aware of). Binary is just written to a file 
as text is but in unicode (im assuming).

So why then do we have to set the "b" for binary mode flag in the 
File.open ?
Sometimes binary can have the ^Z character in it. As binary its doing 
nothing more than any other character- representing some information but 
in windows that character represents end of file.

File.open expects text files so if it comes accross ^Z it will stop 
reading even if the text is actually representing binary. To stop ruby 
doing that you use "b" in your call to .open.

This is a windows only issue apparently.

This will explain why i was getting different lengths with

data_a = File.read('mn-scrape.txt')
data_b = File.open("mn-scrape.txt", "rb").readlines.join("")
data_a.scan(/./m).length ( ==> 170799 )
data_b.scan(/./m).length ( ==> 767702 )
-- 
Posted via http://www.ruby-forum.com/.