Steve [RubyTalk] wrote:
> Robert Klemme wrote:
>>>     # Assume offsets is a pre-computed array of positive integer
>>> positions into the String originalstr.
>>>
>> Care to unveil a bit of the nature of the computation that yields
>> those indexes?
>>
> It's not really relevant to the question I was asking - but I've no
> problem saying more.  I've a domain specific (order-preserving and
> extensible) 'type-system' which is imposed over otherwise opaque data
> structures.  Given an instance of a 'type-signature' and a pointer, it
> is possible to determine the number of bytes which represent each
> 'typed value' - and (significantly) list construction drops out as
> being the concatenation of the value representations and
> type-signatures.  The
> type signatures range in complexity from the simplest constant
> 'N-bytes interpreted as a natural number' through sentinel encodings
> (Null terminated strings on steroids) and (in principle - if not
> frequently in practice) arbitrary computation ranging over named
> integer values occurring 'earlier' in the list.
> At the moment I'm toying with the idea that I can memory-map the
> values (using a C-implemented module) and do the computations on the
> mapped values in Ruby - having presented opaque values and
> 'type-signatures' as String objects to Ruby.  I expect that typical
> computations may involve matching regular expressions; doing
> arithmetic; computing various hashes and summations etc.  At the
> moment I'm concentrating on establishing if Ruby is a suitable tool
> for the task at hand.

Certainly.  I don't know now complex your type calculations are.  I'd
probably do something along the line of:

class SpecialData

  def initialize(s)
    @data = s.frozen? ? s : s.dup.freeze
    @meta_data = calculate_meta(@data)
  end

  def size() @meta_data.size end

  def get(field_index)
    @meta_data[field_index].extract(@data)
  end

private

  def calculate_meta(data)
    md = []
    # your complex calculation here, sample:
    md << StringMeta.new(0..-1)
    md
  end

  class BaseMeta
    attr_accessor :range
    def initialize(range) @range = range end
    def base_extract(str) str[self.range].freeze end
  end

  class StringMeta < BaseMeta
    alias :extract :base_extract
  end

  class IntMeta < BaseMeta
    def extract(s) base_extract(s).unpack("i")[0] end
  end
end


>>>     # with offsets[0]==0 and offsets[-1]==@originalstr.size
>>>     @fields=Array.new (offsets.size-1)
>>>     for i in 1..(offsets.size) do
>>>       # I assume this next line is what is meant by a Ruby
>>>       sub-string?
>>>     @fields[i-1]=@originalstr[offsets[i-1]..offsets[i]] end
>>>
>>> .. and, assuming that @fields is exposed only as a read-only
>>> attribute, that I can assume the memory it consumes to be
>>> independent of the length of originalstr and dependent only upon
>>> numfields?
>>>
>> You can help keeping this read only be freezing all strings involved.
>>
> Yes - that sounds a good idea to me.
>>> While I've no reason to doubt this confirmed answer, by any chance
>>> can someone suggest a good way to demonstrate that this is the case
>>> without resorting to either using very large strings and looking at
>>> VM usage of the interpreter process... or resorting to reviewing the
>>> source to Ruby's implementation?
>>>
>> The only additional method of verification that comes to mind is to
>> ask Matz. :-)
>>
> Hmmm - a lack of profiling tools might prove something of a stumbling
> block... I'll need to have a careful think about that.  Rather than
> wanting to check up on fellow Rubyists, I really want to periodically
> check that I make no invalid assumptions as I work forwards from this
> basis towards an implementation.  I don't want to find out only after
> I think I've finished that a resource leak or extravagant resource
> demands will require a re-write before the software can be used
> against real data.

Then just look at the sources.

Kind regards

    robert