On Sep 17, 2008, at 8:55 PM, James Gray wrote:

> Is unpack("U*") not meeting that need?  I'm not trying to be a jerk,  
> I'm seriously asking.

In fact, that produces the correct answer, and it's what I actually  
use in my RX code (http://www.tbray.org/ongoing/When/200x/2008/06/10/RX-Work 
).  The problem is that it could be a lot more efficient.  It means I  
have to take care of organizing the input into chunks and being  
careful that I haven't chunked in the middle of a UTF-8 character and  
so on, when what I really want, when x is an IO, is

x.each_codepoint do |u|
   # u is a fixint
end

with the buffering and utf-8 unpacking being done at a low level  
without wasting memory. -T