2007/9/18, William James <w_a_x_man / yahoo.com>: > On Sep 17, 4:13 pm, Robert Klemme <shortcut... / googlemail.com> wrote: > > On 17.09.2007 21:49, William James wrote: > > > > > On Sep 17, 1:00 pm, Alex Shulgin <alex.shul... / gmail.com> wrote: > > >> On Sep 17, 6:19 pm, William James <w_a_x_... / yahoo.com> wrote: > > > > >>> Awk is a very popular tool for text processing, but there is no > > >>> way to make it treat a sequence of whitespace characters as a > > >>> record-separator. So in awk, as in Ruby, text is almost always > > >>> read a line at a time. > > >> I thought Ruby is not just a text processing tool, but a general > > >> purpose programming language. > > > > > You thought correctly. But when you talk about reading a word at > > > at time from a text file, you're talking about text processing. > > > The point is that languages (including Ruby) that were designed > > > to be very good at processing text usually read a line at a time, > > > not a word at a time. (A language that is very good at processing > > > text can still be a general purpose language.) Reading a word at > > > a time seems to me to be odd and unnecessary, and I do a lot of > > > text processing. However, here's one way to do it. (It would be > > > a lot more efficient to read by lines.) > > > > > class IO > > > def get_word > > > word = nil > > > while c = self.read(1) > > > if c =~ /\s/ > > > break if word > > > else > > > word||="" > > > word << c > > > end > > > end > > > word > > > end > > > end > > > > > File.open('data'){|file| > > > while w = file.get_word > > > p w > > > end > > > } > > > > I'd probably encapsulate the word reading in a module so the > > implementation can be reused and exchanged if necessary: > > > > module WordIO > > def each_word(&b) > > each do |line| > > line.scan(/\w+/, &b) > > end > > end > > end > > > > class IO > > include WordIO > > > > def self.readwords(file) > > words = [] > > open(file) {|io| io.each_word {|wd| words << wd}} > > words > > end > > end > > > > ARGF.extend WordIO > > > > # additional goody > > class String > > include WordIO > > end > > > > :-) > > > > Kind regards > > > > robert > > Very sophisticated. > > Since the o.p. wants whitespace as the word-separator, > the reg.exp. should be changed to /\S+/. See also Bertram's remark. Btw, that's probably also the reason why this is not in the standard: there is probably no one size fits all definition of "word". We have seen at least two so far and I reckon there are more. :-) > But, dang it all, I'm gonna say you're cheating because > you're still reading lines behind the scenes! ;-) But I said the implementation can be exchanged. > Reading lines and breaking them into words is a lot > easier than reading characters and constructing words. Correct. But just a bit: module WordIO def wchar?(c) /\A\w\z/ =~ c.chr end def each_word word = nil while ( c = getc ) if wchar? c (word ||= "") << c else yield word if word word = nil end end self end end Kind regards robert