On Fri, Jul 6, 2012 at 10:06 AM, Jan E. <lists / ruby-forum.com> wrote:
> Hi,
>
> Joao Silva wrote in post #1067618:
>> As I can implement the above?
>
> For large text you may use String#scan, which has the advantage of not
> collecting all words in an array like String#split does:

word_count = 0
input_text.scan(/\w+/){ word_count += 1}

> input_text = 'This is a sentence.'
> word_count = input_text.strip.scan(/\s+/).size + 1

I don't think this usage of #scan is a good approach, because it will
yield totally wrong results:

irb(main):002:0> input_text = '. : & #'
=> ". : & #"
irb(main):003:0> input_text.strip.scan(/\s+/).size + 1
=> 4

Whereas positive matching sequences of word characters is much closer
to the reality:

irb(main):004:0> input_text.scan(/\w+/).size
=> 0

> But like Jesus already said, this simple approach will not always work.
> If the "words" in your text may contain whitespace, then looking for
> whitespace will obviously fail. You'll have to use a dictionary in this
> case. This would also cover errors (missing or superfluous whitespace).

It's crucial to clarify the definition of "word", I agree.

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/