On Sep 17, 4:13 pm, Robert Klemme <shortcut... / googlemail.com> wrote: > On 17.09.2007 21:49, William James wrote: > > > On Sep 17, 1:00 pm, Alex Shulgin <alex.shul... / gmail.com> wrote: > >> On Sep 17, 6:19 pm, William James <w_a_x_... / yahoo.com> wrote: > > >>> Awk is a very popular tool for text processing, but there is no > >>> way to make it treat a sequence of whitespace characters as a > >>> record-separator. So in awk, as in Ruby, text is almost always > >>> read a line at a time. > >> I thought Ruby is not just a text processing tool, but a general > >> purpose programming language. > > > You thought correctly. But when you talk about reading a word at > > at time from a text file, you're talking about text processing. > > The point is that languages (including Ruby) that were designed > > to be very good at processing text usually read a line at a time, > > not a word at a time. (A language that is very good at processing > > text can still be a general purpose language.) Reading a word at > > a time seems to me to be odd and unnecessary, and I do a lot of > > text processing. However, here's one way to do it. (It would be > > a lot more efficient to read by lines.) > > > class IO > > def get_word > > word = nil > > while c = self.read(1) > > if c =~ /\s/ > > break if word > > else > > word||="" > > word << c > > end > > end > > word > > end > > end > > > File.open('data'){|file| > > while w = file.get_word > > p w > > end > > } > > I'd probably encapsulate the word reading in a module so the > implementation can be reused and exchanged if necessary: > > module WordIO > def each_word(&b) > each do |line| > line.scan(/\w+/, &b) > end > end > end > > class IO > include WordIO > > def self.readwords(file) > words = [] > open(file) {|io| io.each_word {|wd| words << wd}} > words > end > end > > ARGF.extend WordIO > > # additional goody > class String > include WordIO > end > > :-) > > Kind regards > > robert Very sophisticated. Since the o.p. wants whitespace as the word-separator, the reg.exp. should be changed to /\S+/. But, dang it all, I'm gonna say you're cheating because you're still reading lines behind the scenes! Reading lines and breaking them into words is a lot easier than reading characters and constructing words.