On Sep 17, 2008, at 8:55 PM, James Gray wrote: > Is unpack("U*") not meeting that need? I'm not trying to be a jerk, > I'm seriously asking. In fact, that produces the correct answer, and it's what I actually use in my RX code (http://www.tbray.org/ongoing/When/200x/2008/06/10/RX-Work ). The problem is that it could be a lot more efficient. It means I have to take care of organizing the input into chunks and being careful that I haven't chunked in the middle of a UTF-8 character and so on, when what I really want, when x is an IO, is x.each_codepoint do |u| # u is a fixint end with the buffering and utf-8 unpacking being done at a low level without wasting memory. -T