John Joyce wrote: > I don't know if the two main chinese sets are encoded as different > ranges or simply declared in some way. > In general in Unicode a character is the same character even when it > appears in a different language. Many characters of these two set of Chinese(in fact, including Chinese Characters in Japanese and Korean...) are the same. Aren't they encoded to the same codes when they are identical? Gary Thomas wrote: > I believe the range is (in hex) 3400 to 97A5 You must mean Unicode range. http://www.khngai.com/chinese/charmap/tbluni.php?page=0 John Joyce wrote: > You might want to check the RubyGems gem unihan .... hmmmmm.. if only I could find out what it does... John Joyce wrote: > http://www.alanwood.net/unicode/index.html > I've been interested in this subject myself, but it is a big one. Interesting subject indeed it is. Today I tried this(!!!!under RoR console!!!!): >> c=%w{¡È ¡É¡£ ¡¤ ¡ª ¡ã ¡Ð ¡¨ ¡Æ ¡ª ¡÷ ¡ô ¡ð ¡ó ¡Ä ¡ö ¡Ê ¡Ë °ì ù» ÌÞ ùØ Óü »Ñ ¼÷ ×Ý Èâ ÙÞ ²³ Ý¡ Ýù ࢠáß µª æË Âà } => ["¡È", "¡É¡£", "¡¤", "¡ª", "¡ã", "¡Ð", "¡¨", "¡Æ", "¡ª", "¡÷", "¡ô", "¡ð", "¡ó", "¡Ä", "¡ö", "¡Ê", "¡Ë", "°ì", "ù»", "", "", "ÌÞ", "", "ùØ", "Óü", "»Ñ", " ¼÷", "", "×Ý", "", "Èâ", "ÙÞ", "", "²³", "Ý¡", "Ýù", "", "", "à¢", "", "", "áß", "", "", "", "", "µª", "æË", "Âà", "", "", "", "", "", "", "", "", ""] >> c.collect.map{|o| o[0]} => [226, 226, 239, 239, 239, 239, 239, 226, 239, 239, 239, 239, 239, 226, 239, 239, 239, 228, 228, 229, 229, 229, 229, 229, 229, 229, 229, 229, 229, 230, 230, 230, 230, 230, 230, 230, 230, 231, 231, 231, 231, 231, 231, 231, 231, 231, 231, 231, 233, 233, 233, 233, 233, 233, 233, 233, 233, 233] >> c.collect.map{|o| o[0]}.sort => [226, 226, 226, 226, 228, 228, 229, 229, 229, 229, 229, 229, 229, 229, 229, 229, 230, 230, 230, 230, 230, 230, 230, 230, 231, 231, 231, 231, 231, 231, 231, 231, 231, 231, 231, 233, 233, 233, 233, 233, 233, 233, 233, 233, 233, 239, 239, 239, 239, 239, 239, 239, 239, 239, 239, 239, 239, 239] >> c.collect.map{|o| o[0]}.sort.uniq => [226, 228, 229, 230, 231, 233, 239] There punctuations are those commonly used in China. There Chinese characters are randomly pickup from http://www.khngai.com/chinese/charmap/tbluni.php?page=0 (from all the six pages.) maybe 226 to 239 is the range I need. -- Posted via http://www.ruby-forum.com/.