197049-253929

196904-229557 subjects 197094-376951

Solaris 9 SPARC packages for RubyGems?
197049 [jpshack@gm i] Anyone out there ever build a Solaris 9 SPARC package for RubyGems?
197056 [djberg96@gm ] Is it possible you installed openssl *after* you built Ruby?  I haven't
197090 [jpshack@gm i] Daniel,
197093 [djberg96@gm ] Cool. They must have added those recently.  I haven't checked in a while. :)

Net::IMAP && http://rubyforge.org/projects/rubyntlm/
197050 [reid.thompso] Has anyone successfully incorporated

[OT] Re: Why the lack of mixing-in support for Class methods?
197055 [ara.t.howard] <snip ascii art>

net::http  and htaccess
197061 [charlie@ca t] ...

re net::http and htaccess
197062 [charlie@ca t] ...

RUBY keeps complaining about my "end" statement
197076 [pbailey@bn .] When I try and run a script that has this as part of it, RUBY keeps
+ 197078 [logancapaldo] Someone forgot quotes around his string interpolation. Of course its
| 197232 [pbailey@bn .] You're absolutely right. I put quotes in and it worked. But, I don't
+ 197079 [leslieviljoe] ftp.putbinaryfile("#{ps2kfile}")
| 197234 [pbailey@bn .] Yes, that worked. Thank you. I don't know why it needs the quotes, but,
+ 197080 [vjoel@pa h. ] ruby is interpreting the # as a comment, since it's not within a string.
+ 197082 [tony@tw nc d] ti01,
  197235 [pbailey@bn .] That worked. And, it is simpler. It's not a string, anyway, so, why

Unicode roadmap?
197089 [roman.hausne] In my opinion, Ruby is practically useless for many applications without
+ 197102 [matz@ru y- a] Define "proper Unicode support" first.
| + 197103 [pertl@gm .o ] having an unicode-equivalent for all methods of class String
| | + 197106 [logancapaldo] def substring(str, start, len)
| | | 197108 [pertl@gm .o ] From the theoretical point of view this is quite interesting. Also I
| | | 197110 [vshepelev@im] The same is for Russians/Ukrainians. In our programming communities question
| | | 197134 [matz@ru y- a] Alright, then what specific features are you (both) missing?  I don't
| | | + 197153 [vshepelev@im] I suppose, all we (non-English-writers) need is to have all string-related
| | | | + 197164 [matz@ru y- a] In that sense, _I_ am one of the non-English-writers, so that I can
| | | | | 197167 [vshepelev@im] Sorry, Matz, I know, of course. But I know too less about Japanese to see
| | | | | + 197169 [grzm@se sp t] Just to chime in, aren't upcase, downcase, and capitalize a locale/
| | | | | | 197177 [vshepelev@im] Really? I know about two cases: European capitalization and no
| | | | | | + 197204 [pbattley@gm ] There is variety even within western European languages - Dutch, for
| | | | | | | 197205 [vshepelev@im] I already realized. (I've said about Florian Gross, his surname last "ss"
| | | | | | + 197208 [hramrach@ce ] There is no such thing like European capitalization. There is only<insert your language> capitalization.The german character  has no uppercase version. In most languagesusing Latin script the uppercase of 'i' is 'I'. But Turkish has i andi without dot, and the uppercase of 'i' is, of course, I with dot.
| | | | | + 197171 [vincent.isam] To have the length of a Unicode string, just do str.split(//).length,
| | | | | | + 197176 [vshepelev@im] I know about it. But, theoretically speaking, such a "core" methods muts be
| | | | | | + 197226 [halostatue@g] You can't currently use them with Ruby. The file operations in Ruby
| | | | | + 197175 [matz@ru y- a] OK. Case is the problem.  I understand.
| | | | | | + 197180 [vshepelev@im] I can confirm. But I'm afraid that some libraries I rely on use #length and
| | | | | | | 197200 [pbattley@gm ] Those libraries should probably be considered broken; they can and
| | | | | | | 197203 [pertl@gm .o ] That will be quite _some_ libraries, I guess...
| | | | | | + 197195 [pbattley@gm ] str.sub!('32 path encoding ', '') # :-)
| | | | | | + 197227 [halostatue@g] It's not that bad, Matz. I started as a Unix developer, but in the
| | | | | + 197224 [halostatue@g] They are UTF-16 internally. I haven't been paying attention to Ruby
| | | | + 197190 [drbrain@se m] $ cat x.rb
| | | + 197206 [hramrach@ce ] What I want is all methods working seamlessly with unicode strings so
| | | | + 197209 [pbattley@gm ] utf8_string.unpack('U*') is pretty close to this, giving an array of codepoints.
| | | | | 197219 [hramrach@ce ] But I want it to be string after the conversion, so that I can use
| | | | | 197336 [rhkramer@gm ] (RE my previous post):  Oops, maybe UTF-32 is exactly what I was alluding to?
| | | | | 197343 [headius@he d] ...
| | | | | 197346 [listbox@ju i] confusion with the ML controls.
| | | | | + 197354 [headius@he d] ...
| | | | | | 197356 [halostatue@g] No. In fact, I believe that Matz has the right idea for M17N strings
| | | | | | + 197360 [listbox@ju i] It's very difficult for me to understand the implementation. What if
| | | | | | + 197361 [pjhyett@gm i] Yes, we all understand that Ruby 2.0 will be the coolest thing since
| | | | | |   197362 [halostatue@g] *shrug*
| | | | | |   197374 [dmitry.sever] ...
| | | | | + 197425 [hramrach@ce ] Where else should the strings be flagged? If you get a web page
| | | | |   197431 [listbox@ju i] They should nog be flagged, because some strings will be flagged and
| | | | |   197453 [hramrach@ce ] You can certainly get the things wrong. But if you get a string that
| | | | + 197228 [halostatue@g] That will *never* happen. Even with Unicode, you have to think about
| | | |   197416 [hramrach@ce ] No. Since I have locale stdin can be marked with the proper encoding
| | | |   197789 [strobel@se u] I empathically agree. I'll even repeat and propose a new Plan for
| | | |   + 197810 [langstefan@g] Juergen, I agree with most of what you have written. I will
| | | |   | + 197818 [halostatue@g] This is incorrect. *Most* Ruby programs won't need to care about the
| | | |   | | 197845 [langstefan@g] As long as one treats a character string as a character
| | | |   | | + 197872 [hramrach@ce ] No, it is not.
| | | |   | | | + 197881 [listbox@ju i] I would much rather prefer UTF-8 in a language such as Ruby which is
| | | |   | | | | + 197908 [tbray@te tu ] There's a lot of UTF-16 out there.  There's more ISO-8859-* than
| | | |   | | | | + 197918 [hramrach@ce ] Here you go. You can have the strings in UTF-8, and I can heve them in
| | | |   | | | |   197935 [listbox@ju i] Oh, lord... Have you at least tried that to make such assumtpions? In
| | | |   | | | |   197971 [matz@ru y- a] 1.9 Oniguruma regexp engine should handle these, otherwise it's a bug.
| | | |   | | | |   197972 [listbox@ju i] I'll try to check. Oniguruma on 1.8.4. didn't cope, but maybe it just
| | | |   | | | |   197975 [matz@ru y- a] If you have any problem, send us a report with what you expect and
| | | |   | | | |   197980 [listbox@ju i] irb(main):011:0> "ݧѧԧէѧ⧯ѧ" =~ /[-]/i
| | | |   | | | |   197985 [matz@ru y- a] I found out that Oniguruma casefolding works only for characters
| | | |   | | | |   197986 [listbox@ju i] Thanks for the clarification :-)
| | | |   | | | |   197988 [dmitry.sever] ...
| | | |   | | | |   197995 [matz@ru y- a] Thank you for the ideas.
| | | |   | | | |   + 198013 [hramrach@ce ] What is "ascii"? Specifically I would like string operations to suceed
| | | |   | | | |   | 198018 [matz@ru y- a] Every encoding has an attribute named ascii_compat.  EUC_JP, SJIS,
| | | |   | | | |   | + 198024 [dmitriid@gm ] I wonder. Why cannot Strings throughout Ruby be _always_ represented
| | | |   | | | |   | | 198027 [halostatue@g] This entire discussion is centered around a proposal to do exactly
| | | |   | | | |   | | + 198030 [dmitriid@gm ] I totally agree with that. IMO, the point lies exactly in this
| | | |   | | | |   | | | 198038 [halostatue@g] I think that's more likely with (a) what we have now and (b) a
| | | |   | | | |   | | + 198069 [tbray@te tu ] To enlighten the ignorant, could you describe one or two scenarios
| | | |   | | | |   | |   198076 [halostatue@g] I've found that a Unicode-based string class gets in the way when it
| | | |   | | | |   | + 198062 [hramrach@ce ] Reading what you said it appears it would be only possible to add
| | | |   | | | |   |   198110 [matz@ru y- a] You will have all your strings in the encoding you choose as a
| | | |   | | | |   |   198169 [hramrach@ce ] If I read pieces of text from web pages they can be in different
| | | |   | | | |   |   + 198184 [timothy.s.be] ...
| | | |   | | | |   |   | + 198197 [M.B.Smillie@] So we shouldn't do it because it doesn't work in web browsers?
| | | |   | | | |   |   | + 198201 [hramrach@ce ] No, I meant that the strings are, of course, converted to a common
| | | |   | | | |   |   | | 198223 [matz@ru y- a] If you choose to convert all input text data into Unicode (and convert
| | | |   | | | |   |   | | 198351 [hramrach@ce ] Well, it's actually you who chose the conversion on input for me.
| | | |   | | | |   |   | | 198370 [matz@ru y- a] Agreed.  It is me.  Perhaps you don't know how terrible code
| | | |   | | | |   |   | | + 198379 [dmitry.sever] ...
| | | |   | | | |   |   | | | 198382 [matz@ru y- a] Indeed.
| | | |   | | | |   |   | | | + 198385 [halostatue@g] Just a thought. Might it be possible to have a new String literal for
| | | |   | | | |   |   | | | | 198395 [matz@ru y- a] I am not sure this is a good idea or not (yet).  If your "u" text
| | | |   | | | |   |   | | | + 198386 [listbox@ju i] Matz, this would be a disaster (if in such a situation a library
| | | |   | | | |   |   | | |   198392 [matz@ru y- a] Can you elaborate?  I don't want to see disaster whatever it is.
| | | |   | | | |   |   | | |   + 198397 [dmitry.sever] ...
| | | |   | | | |   |   | | |   | + 198418 [listbox@ju i] What I meant is the desritption how you get a Python program wielded
| | | |   | | | |   |   | | |   | + 198453 [matz@ru y- a] Agreed in principle.  But it seems to be fundamental complexity of the
| | | |   | | | |   |   | | |   |   198475 [meadow.nnick] a) weak suggestion?
| | | |   | | | |   |   | | |   |   198480 [matz@ru y- a] Weak suggestion, if I understand you correctly.
| | | |   | | | |   |   | | |   |   + 198504 [hramrach@ce ] What I had in mind was much simpler. If the strings do not match just
| | | |   | | | |   |   | | |   |   + 198937 [ij.rubylist@] Strong assertion + auto conversion is the only solution which will
| | | |   | | | |   |   | | |   |     + 198938 [listbox@ju i] The greatest about Cocoa is that I'm able to suspect that 99 percent
| | | |   | | | |   |   | | |   |     + 198961 [halostatue@g] This is an incorrect and unsupportable statement. It is completely
| | | |   | | | |   |   | | |   |       + 198979 [headius@he d] ...
| | | |   | | | |   |   | | |   |       | + 198993 [matz@ru y- a] A string is a sequence of data that can be represented by small
| | | |   | | | |   |   | | |   |       | + 199016 [headius@he d] ...
| | | |   | | | |   |   | | |   |       |   + 199018 [matz@ru y- a] I still don't see how separate types and behaviors would be more
| | | |   | | | |   |   | | |   |       |   | + 199026 [ij.rubylist@] Above code assumes all file operations return byte arrays. What is
| | | |   | | | |   |   | | |   |       |   | | + 199035 [matz@ru y- a] line = File.open(filename, "r", "utf8") {|f| f.gets }
| | | |   | | | |   |   | | |   |       |   | | | 199043 [ij.rubylist@] Oh, I see. So basically IO always returns ByteArray, and one needs to
| | | |   | | | |   |   | | |   |       |   | | | 199047 [matz@ru y- a] The detail is not fixed yet but it would honor locales for the default
| | | |   | | | |   |   | | |   |       |   | | + 199039 [halostatue@g] As Tim Bray pointed out in a response to me, trying to get a String
| | | |   | | | |   |   | | |   |       |   | + 199028 [dan-ml@da 42] Just jumping into the discussion here, I have to agree with Matz. A char-vector
| | | |   | | | |   |   | | |   |       |   |   199034 [logancapaldo] Regular expressions are a very powerful tool, but they do not
| | | |   | | | |   |   | | |   |       |   |   + 199036 [vjoel@pa h. ] irb(main):001:0> "It's needlessly cryptic."[/./]
| | | |   | | | |   |   | | |   |       |   |   | 199127 [logancapaldo] It's funny I'm always forgetting you can index by regexp. But this
| | | |   | | | |   |   | | |   |       |   |   | 199129 [halostatue@g] I kinda like that.
| | | |   | | | |   |   | | |   |       |   |   | 199194 [mike@st k. a] Presumably this is general arm waving, because s[/./] need not return
| | | |   | | | |   |   | | |   |       |   |   | 199206 [halostatue@g] I'm referring to s[byte: 0]. It's elegant.
| | | |   | | | |   |   | | |   |       |   |   | 199209 [gwtmp01@ma .] It seems a bit weighty.  It requires the allocation of a Hash simply
| | | |   | | | |   |   | | |   |       |   |   | 199217 [logancapaldo] **Must defend random syntax that I invented ;-)**
| | | |   | | | |   |   | | |   |       |   |   + 199040 [dan-ml@da 42] It's funny, maybe I'm just dumb but I can't think of a single *real-world*
| | | |   | | | |   |   | | |   |       |   |     + 199049 [dmitriid@gm ] Substrings? Finding occurence of a string in a nother string? Why
| | | |   | | | |   |   | | |   |       |   |     | 199053 [dan-ml@da 42] Those operations are precisely what regexes are best at.
| | | |   | | | |   |   | | |   |       |   |     | 199079 [listbox@ju i] It does seem unnatural and hints that you are working with an
| | | |   | | | |   |   | | |   |       |   |     + 199070 [hramrach@ce ] Have you looked at the "short but unique" ruby quiz?
| | | |   | | | |   |   | | |   |       |   |     + 199075 [listbox@ju i] Well, think again. You have a truncate(text) helper in Rails which
| | | |   | | | |   |   | | |   |       |   |     + 199327 [drosihn@gm i] If that is the case, then why doesn't Ruby remove *all* substring
| | | |   | | | |   |   | | |   |       |   |     | 199328 [Daniel.Berge] Because it would be a disaster.  You want real world examples?  Take a
| | | |   | | | |   |   | | |   |       |   |     | + 199333 [phurley@gm i] Raising my hand, but the question might be who does character access
| | | |   | | | |   |   | | |   |       |   |     | | 199373 [dmitriid@gm ] I guess, that would be anyone in East Europe with Cyrillic-based
| | | |   | | | |   |   | | |   |       |   |     | | + 199375 [pbattley@gm ] I'm confused - I thought we were talking about Ruby! ;-)
| | | |   | | | |   |   | | |   |       |   |     | | | 199376 [dmitriid@gm ] simultaneously. It shows
| | | |   | | | |   |   | | |   |       |   |     | | + 199389 [halostatue@g] IIRC, any "unknown" encoding will be treated as a binary string where
| | | |   | | | |   |   | | |   |       |   |     | |   + 199416 [matz@ru y- a] I am not sure how much this is more useful than usual string literals
| | | |   | | | |   |   | | |   |       |   |     | |   | + 199451 [halostatue@g] I'm not sure I like the encoding pragma, personally, since it's at the
| | | |   | | | |   |   | | |   |       |   |     | |   | | + 199463 [listbox@ju i] Please no. Please please no.
| | | |   | | | |   |   | | |   |       |   |     | |   | | | 199469 [halostatue@g] Except that @top is guaranteed to not have an encoding -- at least it
| | | |   | | | |   |   | | |   |       |   |     | |   | | | 199473 [listbox@ju i] You never know if you are, that's the problem. And no, it's NOT
| | | |   | | | |   |   | | |   |       |   |     | |   | | | + 199474 [listbox@ju i] I meant C in this part, sorry.
| | | |   | | | |   |   | | |   |       |   |     | |   | | | + 199478 [halostatue@g] How can you continue to be so wrong? All strings will *not* become
| | | |   | | | |   |   | | |   |       |   |     | |   | | |   + 199490 [headspin@gm ] One is for 1-byte encodings. If you know that char==byte, byte-levelops will speed up processing of the strings, since no second guessinghas to be done.
| | | |   | | | |   |   | | |   |       |   |     | |   | | |   + 199498 [jim@we ri hh] I think his point is that for any arbitrary string you cannot guarantee
| | | |   | | | |   |   | | |   |       |   |     | |   | | |   | 199505 [gwtmp01@ma .] Does it even make sense to talk about 'encodings' in the context of
| | | |   | | | |   |   | | |   |       |   |     | |   | | |   | 199518 [halostatue@g] If you don't know the encoding, you must use binary (unencoded) data.
| | | |   | | | |   |   | | |   |       |   |     | |   | | |   + 199503 [headius@he d] ...
| | | |   | | | |   |   | | |   |       |   |     | |   | | |     + 199515 [jim@we ri hh] I also have this concern.
| | | |   | | | |   |   | | |   |       |   |     | |   | | |     | 199519 [halostatue@g] I can't say that I *like* it; it's *clean* to say str[0..5], but I
| | | |   | | | |   |   | | |   |       |   |     | |   | | |     + 199517 [halostatue@g] This is essentially how I view Strings. What makes a String special is
| | | |   | | | |   |   | | |   |       |   |     | |   | | |       199530 [tbray@te tu ] I think people understand what you want.  But those of us who've done
| | | |   | | | |   |   | | |   |       |   |     | |   | | |       199557 [matz@ru y- a] Have you ever heard of regular expression engine (one of the hardest
| | | |   | | | |   |   | | |   |       |   |     | |   | | |       199565 [hramrach@ce ] But then you risk that people would lick the floor, which some may
| | | |   | | | |   |   | | |   |       |   |     | |   | | + 199529 [matz@ru y- a] I'd rather see r"\x89PNG\x0d\x0a\x1a\x0a" (or b"..."), since I expect
| | | |   | | | |   |   | | |   |       |   |     | |   | | | 199552 [halostatue@g] As I indicated in a later post, that's also acceptable.
| | | |   | | | |   |   | | |   |       |   |     | |   | | + 199538 [dan-ml@da 42] $KCODE = 'u'
| | | |   | | | |   |   | | |   |       |   |     | |   | + 199455 [listbox@ju i] Austin,  I don't understand why my strings are more "special" than
| | | |   | | | |   |   | | |   |       |   |     | |   |   199464 [halostatue@g] Excuse me? I'm not the one who has been advocating separate classes. I
| | | |   | | | |   |   | | |   |       |   |     | |   + 199430 [phurley@gm i] In this regard I would love to see a user definable string quote
| | | |   | | | |   |   | | |   |       |   |     | + 199339 [tbray@te tu ] Point granted, but I bet the Win32 stuff assumes 8-bit "characters"
| | | |   | | | |   |   | | |   |       |   |     |   + 199340 [logancapaldo] I have an example, but I'm sure most would consider it "cheating".
| | | |   | | | |   |   | | |   |       |   |     |   + 199342 [djberg96@gm ] Maybe I should fork a version of Ruby tailored specifically to Windows.
| | | |   | | | |   |   | | |   |       |   |     |     199366 [halostatue@g] Trust me. You don't want to do that. TCHAR with -DUNICODE is pure evil.
| | | |   | | | |   |   | | |   |       |   |     |     199368 [djberg96@gm ] Well, it would be -DMBCS. ;)
| | | |   | | | |   |   | | |   |       |   |     + 199347 [martin@sn wp] I'll point you at my solution to ruby quiz #83: (short but unique)
| | | |   | | | |   |   | | |   |       |   |       199356 [dan-ml@da 42] I'll grant that I don't have enough imagination and that there *are* cases where
| | | |   | | | |   |   | | |   |       |   + 199033 [halostatue@g] There's no meaningful distinction between the division of
| | | |   | | | |   |   | | |   |       + 198980 [ij.rubylist@] Well, if it is a byte array, it is not a String (an array of
| | | |   | | | |   |   | | |   |         + 198987 [hramrach@ce ] Here you contradict yourself. Regexes are string (character)
| | | |   | | | |   |   | | |   |         | 198988 [sitharus@si ] Why not? When I read CGI params I get them as strings, but if I want
| | | |   | | | |   |   | | |   |         | 199006 [halostatue@g] Sorry, but "reading" CGI params is a red herring. You may get it as one
| | | |   | | | |   |   | | |   |         | + 199009 [sitharus@si ] Exactly.
| | | |   | | | |   |   | | |   |         | | 199014 [matz@ru y- a] They are equally more complex than the current design.  If File can
| | | |   | | | |   |   | | |   |         | | + 199017 [halostatue@g] There are operations for Strings (#each_character, perhaps) that make
| | | |   | | | |   |   | | |   |         | | | 199140 [chneukirchen] Partly off-topic, but important nevertheless: *Then* it's the right
| | | |   | | | |   |   | | |   |         | | | 199141 [halostatue@g] Oh, please, yes. I get tired of libraries breaking because people
| | | |   | | | |   |   | | |   |         | | + 199025 [tbray@te tu ] Well, on strings, indexing and substring operations and iterators and
| | | |   | | | |   |   | | |   |         | | | + 199029 [mailinglists] For UTF-8 which hopefully will rule the world soon, the worst libraries
| | | |   | | | |   |   | | |   |         | | | + 199038 [halostatue@g] Those are interpretations of the data underlying the String, though.
| | | |   | | | |   |   | | |   |         | | + 199435 [strobel@se u] Any additional complexity here should be offset later, when doing
| | | |   | | | |   |   | | |   |         | |   199472 [halostatue@g] It won't be. All of the complexity of the m17n String will be inside of
| | | |   | | | |   |   | | |   |         | |   + 199477 [ij.rubylist@] True. That's why most solutions do not offer String IO, but only
| | | |   | | | |   |   | | |   |         | |   + 199566 [strobel@se u] Having said lens adds complexity. I'll always have to think of the
| | | |   | | | |   |   | | |   |         | |     199567 [matz@ru y- a] Did I said so?  I am not going to sacrifice anybody.  At least I am
| | | |   | | | |   |   | | |   |         | + 199024 [tbray@te tu ] Maybe I'm missing something, but in today's networked heterogeneous
| | | |   | | | |   |   | | |   |         | | 199037 [halostatue@g] Um. You're not missing anything -- I'm mocking the API pair that would
| | | |   | | | |   |   | | |   |         | + 199074 [listbox@ju i] and you recieve a PART of a unicode string (because you cannot know
| | | |   | | | |   |   | | |   |         |   + 199082 [hramrach@ce ] Why would you read 4096 bytes in the first place?
| | | |   | | | |   |   | | |   |         |   | 199084 [listbox@ju i] This is a pattern. If a file has no line endings, but just one (very
| | | |   | | | |   |   | | |   |         |   | + 199094 [hramrach@ce ] But can you work with the file in parts then? If there is no
| | | |   | | | |   |   | | |   |         |   | + 199098 [halostatue@g] Anyone who wants to set all IO operations to a particular encoding is
| | | |   | | | |   |   | | |   |         |   |   199117 [jim@we ri hh] I've been following this debate with some interest.  Alas, since my
| | | |   | | | |   |   | | |   |         |   |   + 199118 [keith@or il ] +1^2
| | | |   | | | |   |   | | |   |         |   |   + 199128 [halostatue@g] I mostly agree with you here (about prototyping), Jim. There are a few
| | | |   | | | |   |   | | |   |         |   |     + 199148 [hramrach@ce ] - you have only one encoding, and while it may be optimal in some
| | | |   | | | |   |   | | |   |         |   |     | 199156 [ij.rubylist@] Yes. This shows that if there is no autoconversion, programmer will
| | | |   | | | |   |   | | |   |         |   |     | 199159 [halostatue@g] I doubt this is in the least bit true. The real problem is that you're
| | | |   | | | |   |   | | |   |         |   |     | + 199162 [ij.rubylist@] Basically, I am just advocating to get autoconversion into "official"
| | | |   | | | |   |   | | |   |         |   |     | | 199165 [halostatue@g] Um. Not what I'm saying. I want as much clean autoconversion as
| | | |   | | | |   |   | | |   |         |   |     | | + 199169 [jim@we ri hh] (A) Automatically convert input strings to a given encoding (independent
| | | |   | | | |   |   | | |   |         |   |     | | | 199173 [ij.rubylist@] Autoconversion (as suggested by many people in this thread) is meant
| | | |   | | | |   |   | | |   |         |   |     | | + 199171 [ij.rubylist@] This seems too good to be true :-)
| | | |   | | | |   |   | | |   |         |   |     | | | 199179 [tbray@te tu ] I think that anyone, living in any country, working in any language,
| | | |   | | | |   |   | | |   |         |   |     | | + 199196 [danielbaird@] ...
| | | |   | | | |   |   | | |   |         |   |     | |   199207 [halostatue@g] There's a point where you're right. But there's a point where you're
| | | |   | | | |   |   | | |   |         |   |     | + 199167 [headius@he d] ...
| | | |   | | | |   |   | | |   |         |   |     |   + 199174 [ij.rubylist@] Ahem, no.
| | | |   | | | |   |   | | |   |         |   |     |   | 199176 [headius@he d] ...
| | | |   | | | |   |   | | |   |         |   |     |   | + 199210 [dan-ml@da 42] I'd like to point out that MySQL has m17n strings, and it rocks.
| | | |   | | | |   |   | | |   |         |   |     |   | | 199224 [tbray@te tu ] I am often unable to get Unicode strings from Perl into MySQL and
| | | |   | | | |   |   | | |   |         |   |     |   | | 199354 [dan-ml@da 42] I've never had any problems. You just have to make sure the client correctly
| | | |   | | | |   |   | | |   |         |   |     |   | + 199238 [matz@ru y- a] Good point.
| | | |   | | | |   |   | | |   |         |   |     |   | | + 199273 [headius@he d] ...
| | | |   | | | |   |   | | |   |         |   |     |   | | + 199280 [hramrach@ce ] That also means that the implementor has much better understanding of
| | | |   | | | |   |   | | |   |         |   |     |   | | + 199570 [strobel@se u] I don't think you can possibly cater to everyone here.  Simplicissity,
| | | |   | | | |   |   | | |   |         |   |     |   | |   + 199571 [matz@ru y- a] I can't promise implementation simplicity.  Because it would not be
| | | |   | | | |   |   | | |   |         |   |     |   | |   | 199661 [strobel@se u] First, it wasn't me who brought this up, the quote about the 10% is
| | | |   | | | |   |   | | |   |         |   |     |   | |   + 199587 [halostatue@g] Um. You make the same error, I think, that some others have. There are
| | | |   | | | |   |   | | |   |         |   |     |   | + 199240 [pit@ca it in] Charles, could it be that "the uber-string m17n implementation" would
| | | |   | | | |   |   | | |   |         |   |     |   |   199272 [headius@he d] ...
| | | |   | | | |   |   | | |   |         |   |     |   + 199178 [halostatue@g] I do not believe that this is a viable argument for "killing". At
| | | |   | | | |   |   | | |   |         |   |     |   + 199261 [hramrach@ce ] Its' been asked already.
| | | |   | | | |   |   | | |   |         |   |     |     199277 [halostatue@g] To be fair to Charles, he would benefit immensely from a Unicode
| | | |   | | | |   |   | | |   |         |   |     |     199289 [headius@he d] ...
| | | |   | | | |   |   | | |   |         |   |     |     199295 [halostatue@g] IME, more classes complicates. Sometimes the complexity is necessary
| | | |   | | | |   |   | | |   |         |   |     |     199371 [ij.rubylist@] First, "most of my opposition" is not useful in discussion and is a
| | | |   | | | |   |   | | |   |         |   |     |     199396 [halostatue@g] You have misread my English. I am not referring to people who oppose
| | | |   | | | |   |   | | |   |         |   |     |     + 199471 [ij.rubylist@] I think you do not understand what the problem is, because your claim
| | | |   | | | |   |   | | |   |         |   |     |     | 199475 [halostatue@g] Oh, bollocks. Go ahead, pull the other one.
| | | |   | | | |   |   | | |   |         |   |     |     | 199480 [ij.rubylist@] Equivalent in their prediction power. This is the problem I discuss -
| | | |   | | | |   |   | | |   |         |   |     |     | 199484 [ij.rubylist@] And to clear the air - I am not advocating ByteArray unconditionally.
| | | |   | | | |   |   | | |   |         |   |     |     + 199476 [ij.rubylist@] Oh, really? So it is OK for this code to sometimes receive binary
| | | |   | | | |   |   | | |   |         |   |     |     | + 199535 [matz@ru y- a] No, as I said before, reading with length specified shall always
| | | |   | | | |   |   | | |   |         |   |     |     | + 199563 [hramrach@ce ] I would think that STD* should use locale (or equvialent) for default
| | | |   | | | |   |   | | |   |         |   |     |     + 199479 [ara.t.howard] i woulnd't go that far.  i'm wanting a byte array and thinking beyond string
| | | |   | | | |   |   | | |   |         |   |     |       199495 [logancapaldo] This is off on a tangent here but, ara why not just
| | | |   | | | |   |   | | |   |         |   |     + 199172 [jim@we ri hh] Thanks for the response, Austin.  It seemed to help clearify the issues
| | | |   | | | |   |   | | |   |         |   |       199180 [halostatue@g] I would assume both, based on what I've seen from Matz.
| | | |   | | | |   |   | | |   |         |   |       199231 [matz@ru y- a] I think so.
| | | |   | | | |   |   | | |   |         |   + 199096 [halostatue@g] st = File.open("file.txt", "rb", :encoding => :utf8) { |f| f.read(4096) }
| | | |   | | | |   |   | | |   |         + 198990 [tbray@te tu ] +1 to this and to Nutter previously.  Text strings and byte arrays
| | | |   | | | |   |   | | |   |         + 199005 [halostatue@g] It could be indistinguishable from such. Even a Unicode string is
| | | |   | | | |   |   | | |   |           199073 [listbox@ju i] Let's say there are people who not-so-foolishly believe that trying
| | | |   | | | |   |   | | |   + 198414 [listbox@ju i] I imagine that in the case mentioned the encoding assumed for a
| | | |   | | | |   |   | | + 198445 [hramrach@ce ] I do not see how converting the strings on input will make the
| | | |   | | | |   |   | |   198454 [matz@ru y- a] It does.  But if you convert encoding lazily, you will have hard time
| | | |   | | | |   |   | + 198222 [tbray@te tu ] Well, unless you had a String class that took care of the encoding
| | | |   | | | |   |   |   198543 [strobel@se u] That's what I suggested basically. The problem seems to be non-Unicode
| | | |   | | | |   |   + 198215 [gwtmp01@ma .] I'm not sure I understand what 'subset of unicode' means.
| | | |   | | | |   |     198247 [hramrach@ce ] I mean that iso-8859-1 and iso-8859-2 encodings (as well as many
| | | |   | | | |   + 199100 [dmitry.sever] ...
| | | |   | | | |     199235 [matz@ru y- a] Good point.  Currently they don't support non ASCII compatible
| | | |   | | | + 197888 [langstefan@g] If you really need this level of efficiency, Ruby is probably
| | | |   | | | | 197917 [hramrach@ce ] Why? It can already handle utf-8 strings or arrays of unicode
| | | |   | | | + 197904 [tbray@te tu ] Um, the practical experience is that the code required to unpack a
| | | |   | | + 197873 [halostatue@g] And I'm saying that it's a mistake to do that (standardize on a single
| | | |   | | | 197905 [tbray@te tu ] Indeed it's not, but this argument escapes me.  If you try feed that
| | | |   | | + 197903 [tbray@te tu ] Not possible in the general case.  There are a few data formats
| | | |   | + 197825 [gwtmp01@ma .] I don't claim to be an Unicode export but shouldn't the goal be to
| | | |   | | 197833 [langstefan@g] I'm not Juergen, but since you responded to my message...
| | | |   | | + 197836 [gwtmp01@ma .] So all we need is an ideal character set?  That sounds simple.  :-)
| | | |   | | + 197865 [strobel@se u] That's what I meant, yes. And that is the most important point too.
| | | |   | + 197901 [tbray@te tu ] Point of information: there are highly successful word-processing
| | | |   + 197811 [halostatue@g] Agree, mostly. Strings should have a way to indicate the buffer size of
| | | |   | + 197817 [listbox@ju i] Most probably wise, but I need casefolding and character classes to
| | | |   | | 197820 [halostatue@g] I don't disagree. But you're *not* going to get those features, in all
| | | |   | | 197962 [strobel@se u] My title was "A Plan for Unicode Strings in Ruby 2.0". I don't want to
| | | |   | + 197830 [pbattley@gm ] Not to mention that Matz has explicitly stated in the past that he
| | | |   | | 197863 [strobel@se u] AFAIK, EBCDIC can be losslessly converted to Unicode and back. Right?
| | | |   | | + 197867 [pbattley@gm ] They aren't so much unsolvable problems as mutually incompatible
| | | |   | | + 197874 [halostatue@g] Which code page? EBCDIC has as many code pages (including a UTF-EBCDIC)
| | | |   | |   + 197882 [listbox@ju i] Yes, you will spend those cycles to count the letters in my language
| | | |   | |   | 197894 [halostatue@g] I think you're overthinking the problem. Let's consider the guarantees
| | | |   | |   + 197927 [strobel@se u] Obviously, EBCDIC -> UNICODE -> same EBCDIC Codepage as before.
| | | |   | |     197934 [halostatue@g] Again, this is completely unacceptable from a memory usage perspective.
| | | |   | |     197943 [tbray@te tu ] This is another thing you need your String class to be smart about.
| | | |   | |     197952 [chneukirchen] Does that mean that  binary.to_unicode.to_binary != binary  is possible?
| | | |   | |     + 197953 [listbox@ju i] And it does as long as you are not careful. One of the things I do is
| | | |   | |     + 197961 [tbray@te tu ] Yes, but having "m?s" != "m?s" is pretty bad too; the alternative is
| | | |   | |       198009 [chneukirchen] Those were just fictive method calls.  But let's say I read from
| | | |   | |       198053 [tbray@te tu ] Yep.  And yes, calling to_unicode on it might in fact change the bit
| | | |   | + 197870 [strobel@se u] I admit I don't know about Ruby's C extensions. Are they unable to
| | | |   | | + 197875 [gwtmp01@ma .] What leads you to this conclusion?  I don't think it can be refuted
| | | |   | | | + 197906 [tbray@te tu ] I'm not close enough to Ruby to have a useful opinion, but for many
| | | |   | | | + 197930 [strobel@se u] Maybe I was unclear. I did't mean Ruby has too choose an existing
| | | |   | | |   197938 [matz@ru y- a] I understand these attributes might make implementation easier.   But
| | | |   | | |   197964 [strobel@se u] I never worried about performance much, that's Austin. :P
| | | |   | | + 197878 [hramrach@ce ] It's apparent from the explanation above.
| | | |   | + 197902 [tbray@te tu ] Point of information: Of all the widely-used methods of encoding
| | | |   + 197898 [tbray@te tu ] Be careful.  People who care about this stuff might want to read
| | | |     + 197907 [headius@he d] ...
| | | |     + 197932 [listbox@ju i] Let's write a specification.
| | | + 198904 [snaury@gm il] Sorry for maybe getting into, but here are my 5 cents. When I first
| | |   198915 [matz@ru y- a] Unfortunately, not.  I understand Russian people having problem with
| | |   + 198956 [halostatue@g] Matz,
| | |   + 199050 [snaury@gm il] It's actually not about treating everything in UTF-8, it just unifies
| | |     199051 [snaury@gm il] Ah, well, for ole that's not true, only now I realized I can set
| | + 197115 [banshee@ba s] I suspect the Japanese posters on this list can answer better than I can,
| |   + 197116 [dbalmain.ml@] There is a good summary of the han unification controversy on wikipedia;
| |   + 197120 [schapht@gm i] I have one Japanese person here who's never heard of this gaiji
| + 197185 [m-lists@br s] I won't define "proper Unicode support" here.
|   197189 [dmitry.sever] ...
|   197192 [vshepelev@im] Thanks Dmitry!
|   + 197193 [dmitry.sever] ...
|   | 197197 [pertl@gm .o ] Matz,
|   + 197212 [strobel@se u] ode
|     197335 [rhkramer@gm ] Maybe Juergen is saying the same thing I'm going to say, but since I don't
|     197526 [strobel@se u] t=20
|     + 197533 [halostatue@g] [ snip essentially accurate information ]
|     | 197560 [strobel@se u] No. According to wikipedia, it is upt to 4 bytes for plain UTF8 for
|     | + 197612 [pbattley@gm ] Austin's correct about six bytes, actually. The original UTF-8
|     | | 197782 [strobel@se u] I don't care who is technically correct here, that's not the point.
|     | | 197783 [pbattley@gm ] On the contrary: it's exactly the point in a technical discussion of
|     | | 197787 [strobel@se u] The discussion is about a Unicode Roadmap for Ruby. The number of
|     | | 197798 [dmitriid@gm ] following
|     | + 197614 [hramrach@ce ] Please, do not use Wikipedia as an argument. It can contain useful
|     | + 197784 [dmitriid@gm ] Well, there is the official http://unicode.org/ site no one has
|     |   197785 [pbattley@gm ] Good point. Unfortunately, a lot of it is only available as PDFs of
|     + 197897 [tbray@te tu ] Um, hi everyone.  I'm a Rubie newby but very, very old hand at
+ 197156 [pal@pa be gs] I also think that this is very important.
+ 253782 [ivan.mashche] text in this discussion and I may have read it not enough carefully to
  + 253784 [halostatue@g] There are a lot of answers to that question, and I strongly suggest
  | 253803 [erik@ho le s] If it helps any, I've moved ~2000 web pages in an internal work project
  + 253802 [why@ru y- an] Well, Ruby 1.9 (which is due in December) will have some Unicode
  + 253821 [richard.conr] It depends on your definition of 'convenient'.
    253830 [ivan.mashche] IMHO convinient is as in C#. There I don't have to bother how are
    253859 [richard.conr] It's not *that* convenient. By default Ruby strings are 8-byte. You can make
    253929 [dangerwillro] Objective-C (through the Cocoa framework) also handles Unicode
threads.html
top