On 10.12.2007 16:52, Curt Sampson wrote: > I'm writing a C extension that involves fast scanning through and > parsing of tab-delimited files. Basically, I mmap the file, figure out > where the row and column boundaries are, and for each row end up with > an array of strings (pointer and length) for each row that I then pass > on to other C or Ruby code. The array and its strings are not supposed > to be modified by the callees, only read, and I can also live with the > callees being required to make their own copies of the strings and > arrays if they need to keep the data accessable after the call, if I can > figure out some way to enforce that. > > It appears to me that this means I don't really have any need to > copy the data; I ought to just be able to set up a bunch of (likely > frozen) String objects and then tweak the ptr and len on them and pass > them around, avoiding any allocations or data copies. From a bit of > experimentation, I can see that dropping several calls to rb_str_new for > each row results in an enormous speed increase--about ten-fold--in how > fast I can scan through the file. > > Does anybody have any suggestions on a reasonably safe way to do this? This is what I'd do: create a single string per line and use substring (aka #[]) to create strings that represent the portion needed; byte buffer will be shared then. You don't even need to freeze them because of copy on write. Kind regards robert