On 10.12.2007 16:52, Curt Sampson wrote:
> I'm writing a C extension that involves fast scanning through and
> parsing of tab-delimited files. Basically, I mmap the file, figure out
> where the row and column boundaries are, and for each row end up with
> an array of strings (pointer and length) for each row that I then pass
> on to other C or Ruby code. The array and its strings are not supposed
> to be modified by the callees, only read, and I can also live with the
> callees being required to make their own copies of the strings and
> arrays if they need to keep the data accessable after the call, if I can
> figure out some way to enforce that.
> 
> It appears to me that this means I don't really have any need to
> copy the data; I ought to just be able to set up a bunch of (likely
> frozen) String objects and then tweak the ptr and len on them and pass
> them around, avoiding any allocations or data copies. From a bit of
> experimentation, I can see that dropping several calls to rb_str_new for
> each row results in an enormous speed increase--about ten-fold--in how
> fast I can scan through the file.
> 
> Does anybody have any suggestions on a reasonably safe way to do this?

This is what I'd do: create a single string per line and use substring 
(aka #[]) to create strings that represent the portion needed; byte 
buffer will be shared then.  You don't even need to freeze them because 
of copy on write.

Kind regards

	robert