Harry Kakueki wrote: > I'm not sure what you want if there are more than 2 hiragana. I think that a string, or array, of each duple from the string of 2 or more hiragana would be desired. We want to count each occurrence of the hiragana duples as a running tally through a large document. There will be duplicates so perhaps each hiragana duple could be a variable? e.g. "暮ら+= 1" whenever a new instance of 暮ら is found. Would there be a more efficient way of counting the co-occurrences? > $KCODE = 'u' > p str.scan(/[あ-ん]{2,}/) > > OR > > require 'enumerator' > $KCODE = 'u' > str.scan(/[あ-ん]{2,}/).each {|x| > x.split(//).each_cons(2){|a| p a}} > Harry Also, I don't think that the encoding is Unicode. I have opened the document in OO.o and JEdit using the Shift-JIS (as well as 'Apple Macintosh' and 'Windows-932') encoding(s) and the characters seem to render correctly. I am unsure how to convert this text to Unicode for proper analysis in Ruby. --Brylie -- Posted via http://www.ruby-forum.com/.