Steve [RubyTalk] wrote: > Robert Klemme wrote: >>> # Assume offsets is a pre-computed array of positive integer >>> positions into the String originalstr. >>> >> Care to unveil a bit of the nature of the computation that yields >> those indexes? >> > It's not really relevant to the question I was asking - but I've no > problem saying more. I've a domain specific (order-preserving and > extensible) 'type-system' which is imposed over otherwise opaque data > structures. Given an instance of a 'type-signature' and a pointer, it > is possible to determine the number of bytes which represent each > 'typed value' - and (significantly) list construction drops out as > being the concatenation of the value representations and > type-signatures. The > type signatures range in complexity from the simplest constant > 'N-bytes interpreted as a natural number' through sentinel encodings > (Null terminated strings on steroids) and (in principle - if not > frequently in practice) arbitrary computation ranging over named > integer values occurring 'earlier' in the list. > At the moment I'm toying with the idea that I can memory-map the > values (using a C-implemented module) and do the computations on the > mapped values in Ruby - having presented opaque values and > 'type-signatures' as String objects to Ruby. I expect that typical > computations may involve matching regular expressions; doing > arithmetic; computing various hashes and summations etc. At the > moment I'm concentrating on establishing if Ruby is a suitable tool > for the task at hand. Certainly. I don't know now complex your type calculations are. I'd probably do something along the line of: class SpecialData def initialize(s) @data = s.frozen? ? s : s.dup.freeze @meta_data = calculate_meta(@data) end def size() @meta_data.size end def get(field_index) @meta_data[field_index].extract(@data) end private def calculate_meta(data) md = [] # your complex calculation here, sample: md << StringMeta.new(0..-1) md end class BaseMeta attr_accessor :range def initialize(range) @range = range end def base_extract(str) str[self.range].freeze end end class StringMeta < BaseMeta alias :extract :base_extract end class IntMeta < BaseMeta def extract(s) base_extract(s).unpack("i")[0] end end end >>> # with offsets[0]==0 and offsets[-1]==@originalstr.size >>> @fields=Array.new (offsets.size-1) >>> for i in 1..(offsets.size) do >>> # I assume this next line is what is meant by a Ruby >>> sub-string? >>> @fields[i-1]=@originalstr[offsets[i-1]..offsets[i]] end >>> >>> .. and, assuming that @fields is exposed only as a read-only >>> attribute, that I can assume the memory it consumes to be >>> independent of the length of originalstr and dependent only upon >>> numfields? >>> >> You can help keeping this read only be freezing all strings involved. >> > Yes - that sounds a good idea to me. >>> While I've no reason to doubt this confirmed answer, by any chance >>> can someone suggest a good way to demonstrate that this is the case >>> without resorting to either using very large strings and looking at >>> VM usage of the interpreter process... or resorting to reviewing the >>> source to Ruby's implementation? >>> >> The only additional method of verification that comes to mind is to >> ask Matz. :-) >> > Hmmm - a lack of profiling tools might prove something of a stumbling > block... I'll need to have a careful think about that. Rather than > wanting to check up on fellow Rubyists, I really want to periodically > check that I make no invalid assumptions as I work forwards from this > basis towards an implementation. I don't want to find out only after > I think I've finished that a resource leak or extravagant resource > demands will require a re-write before the software can be used > against real data. Then just look at the sources. Kind regards robert