Greetings everybody,

maybe somebody can help me with this: How can I collect n-grams (i.e.
tuples of characters/words/whatever) from plain text? I tried something
like this:

while line = gets
  line.gsub(/[a-zA-Z\s]{3,3}/) {|p| print "#{p},"}
end

However, this makes "too big" steps because the regexp matches one
triple and then the next one behind it, but no overlaps. There must be a
simple solution I guess.

Example:
Input "The man sees the boy with the telescope."
Output "The, ma,n s,ees, th,e b,oy ,wit,h t,he ,tel,esc,ope,"
Desired output "The,he ,e m, ma,man,..."

Thanks for you help
Arno Erpenbeck

BTW: If this list is not intended for questions of this kind, please let
me know, and I will go and look somewhere else.