From ruby-talk-admin@ruby-lang.org Fri Dec 16 02:51:15 2005 Received: from kankan.nagaokaut.ac.jp (kankan.nagaokaut.ac.jp [133.44.2.24]) by blade.nagaokaut.ac.jp (8.12.3/8.12.3/Debian-6.6) with ESMTP id jBFHpFZk030156; Fri, 16 Dec 2005 02:51:15 +0900 Received: from funfun.nagaokaut.ac.jp (funfun.nagaokaut.ac.jp [133.44.2.201]) by kankan.nagaokaut.ac.jp (Postfix) with ESMTP id CCDDD59C8; Fri, 16 Dec 2005 02:51:19 +0900 (JST) Received: from localhost (localhost.nagaokaut.ac.jp [127.0.0.1]) by funfun.nagaokaut.ac.jp (Postfix) with ESMTP id E1FF4F0486B; Fri, 16 Dec 2005 02:51:19 +0900 (JST) Received: from voscc.nagaokaut.ac.jp (voscc.nagaokaut.ac.jp [133.44.1.100]) by funfun.nagaokaut.ac.jp (Postfix) with ESMTP id A8BF8F04847; Fri, 16 Dec 2005 02:51:18 +0900 (JST) Received: from beryllium.ruby-lang.org (beryllium.ruby-lang.org [210.163.138.100]) by voscc.nagaokaut.ac.jp (Postfix) with ESMTP id A16B3630024; Fri, 16 Dec 2005 02:51:18 +0900 (JST) Received: from beryllium.ruby-lang.org (beryllium.ruby-lang.org [127.0.0.1]) by beryllium.ruby-lang.org (Postfix) with ESMTP id DF66933DD6; Fri, 16 Dec 2005 02:51:16 +0900 (JST) Received: from localhost (beryllium.ruby-lang.org [127.0.0.1]) by beryllium.ruby-lang.org (Postfix) with ESMTP id 903C033EA7 for ; Fri, 16 Dec 2005 02:51:09 +0900 (JST) Received: from beryllium.ruby-lang.org ([127.0.0.1]) by localhost (beryllium.ruby-lang.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 15697-05 for ; Fri, 16 Dec 2005 02:51:09 +0900 (JST) Received: from mail.shic.co.uk (adsl.195-248-105-109.dial.hot.broadband.adsl.broadbandonly.co.uk [195.248.105.109]) by beryllium.ruby-lang.org (Postfix) with ESMTP id A591233DD6 for ; Fri, 16 Dec 2005 02:51:08 +0900 (JST) Received: from [127.0.0.1] (localhost [127.0.0.1]) by mail.shic.co.uk (Postfix) with ESMTP id 099E84FCFF for ; Thu, 15 Dec 2005 17:50:46 +0000 (GMT) Delivered-To: ruby-talk@ruby-lang.org Date: Fri, 16 Dec 2005 02:51:10 +0900 Posted: Thu, 15 Dec 2005 17:49:11 +0000 From: "Steve [RubyTalk]" Reply-To: ruby-talk@ruby-lang.org Subject: Re: Question of reference and (sub)strings. To: ruby-talk@ruby-lang.org (ruby-talk ML) Message-Id: <43A1AC97.6090802@shic.co.uk> In-Reply-To: <40dj3kF190h7dU1@individual.net> References: <40cuekF19kr0tU1@individual.net> <43A18FB9.3050208@shic.co.uk> <40dj3kF190h7dU1@individual.net> X-ML-Name: ruby-talk X-Mail-Count: 83 X-MLServer: fml [fml 4.0.3 release (20011202/4.0.3)]; post only (only members can post) X-ML-Info: If you have a question, send e-mail with the body "help" (without quotes) to the address ruby-talk-ctl@ruby-lang.org; help= User-Agent: Thunderbird 1.4.1 (Windows/20051006) X-Original-To: ruby-talk@ruby-lang.org X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at ruby-lang.org X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on beryllium.ruby-lang.org X-Spam-Level: X-Spam-Status: No, score=-5.6 required=7.0 tests=AWL,BAYES_00, CONTENT_TYPE_PRESENT,FAKEDWORD_ATMARK,RCVDFRMLOCALIP, RCVD_IN_SORBS_DUL autolearn=no version=3.0.3 Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk Lines: 60 List-Id: ruby-talk.ruby-lang.org List-Software: fml [fml 4.0.3 release (20011202/4.0.3)] List-Post: List-Owner: List-Help: List-Unsubscribe: X-Virus-Scanned: by AMaViS snapshot-20020531 Robert Klemme wrote: >> # Assume offsets is a pre-computed array of positive integer >> positions into the String originalstr. >> > Care to unveil a bit of the nature of the computation that yields those > indexes? > It's not really relevant to the question I was asking - but I've no problem saying more. I've a domain specific (order-preserving and extensible) 'type-system' which is imposed over otherwise opaque data structures. Given an instance of a 'type-signature' and a pointer, it is possible to determine the number of bytes which represent each 'typed value' - and (significantly) list construction drops out as being the concatenation of the value representations and type-signatures. The type signatures range in complexity from the simplest constant 'N-bytes interpreted as a natural number' through sentinel encodings (Null terminated strings on steroids) and (in principle - if not frequently in practice) arbitrary computation ranging over named integer values occurring 'earlier' in the list. At the moment I'm toying with the idea that I can memory-map the values (using a C-implemented module) and do the computations on the mapped values in Ruby - having presented opaque values and 'type-signatures' as String objects to Ruby. I expect that typical computations may involve matching regular expressions; doing arithmetic; computing various hashes and summations etc. At the moment I'm concentrating on establishing if Ruby is a suitable tool for the task at hand. >> # with offsets[0]==0 and offsets[-1]==@originalstr.size >> @fields=Array.new (offsets.size-1) >> for i in 1..(offsets.size) do >> # I assume this next line is what is meant by a Ruby sub-string? >> @fields[i-1]=@originalstr[offsets[i-1]..offsets[i]] >> end >> >> .. and, assuming that @fields is exposed only as a read-only >> attribute, that I can assume the memory it consumes to be independent >> of the length of originalstr and dependent only upon numfields? >> > You can help keeping this read only be freezing all strings involved. > Yes - that sounds a good idea to me. >> While I've no reason to doubt this confirmed answer, by any chance can >> someone suggest a good way to demonstrate that this is the case >> without resorting to either using very large strings and looking at >> VM usage of the interpreter process... or resorting to reviewing the >> source to Ruby's implementation? >> > The only additional method of verification that comes to mind is to ask > Matz. :-) > Hmmm - a lack of profiling tools might prove something of a stumbling block... I'll need to have a careful think about that. Rather than wanting to check up on fellow Rubyists, I really want to periodically check that I make no invalid assumptions as I work forwards from this basis towards an implementation. I don't want to find out only after I think I've finished that a resource leak or extravagant resource demands will require a re-write before the software can be used against real data. Steve