Harry Kakueki wrote:

> I'm not sure what you want if there are more than 2 hiragana.
I think that a string, or array, of each duple from the string of 2 or 
more hiragana would be desired. We want to count each occurrence of the 
hiragana duples as a running tally through a large document.

There will be duplicates so perhaps each hiragana duple could be a 
variable? e.g. "暮ら+= 1" whenever a new instance of 暮ら is found. Would 
there be a more efficient way of counting the co-occurrences?


> $KCODE = 'u'
> p str.scan(/[あ-ん]{2,}/)
> 
> OR
> 
> require 'enumerator'
> $KCODE = 'u'
> str.scan(/[あ-ん]{2,}/).each {|x| 
> x.split(//).each_cons(2){|a| p a}}
> Harry

Also, I don't think that the encoding is Unicode. I have opened the 
document in OO.o and JEdit using the Shift-JIS (as well as 'Apple 
Macintosh' and 'Windows-932') encoding(s) and the characters seem to 
render correctly. I am unsure how to convert this text to Unicode for 
proper analysis in Ruby.
--Brylie
-- 
Posted via http://www.ruby-forum.com/.