From ruby-talk-admin@ruby-lang.org Fri Dec 16 00:48:02 2005 Received: from kankan.nagaokaut.ac.jp (kankan.nagaokaut.ac.jp [133.44.2.24]) by blade.nagaokaut.ac.jp (8.12.3/8.12.3/Debian-6.6) with ESMTP id jBFFm1Zk025052; Fri, 16 Dec 2005 00:48:01 +0900 Received: from funfun.nagaokaut.ac.jp (funfun.nagaokaut.ac.jp [133.44.2.201]) by kankan.nagaokaut.ac.jp (Postfix) with ESMTP id 05D255CFE; Fri, 16 Dec 2005 00:48:03 +0900 (JST) Received: from localhost (localhost.nagaokaut.ac.jp [127.0.0.1]) by funfun.nagaokaut.ac.jp (Postfix) with ESMTP id D2AAEF04850; Fri, 16 Dec 2005 00:48:06 +0900 (JST) Received: from voscc.nagaokaut.ac.jp (voscc.nagaokaut.ac.jp [133.44.1.100]) by funfun.nagaokaut.ac.jp (Postfix) with ESMTP id 818CFF04847; Fri, 16 Dec 2005 00:48:05 +0900 (JST) Received: from beryllium.ruby-lang.org (beryllium.ruby-lang.org [210.163.138.100]) by voscc.nagaokaut.ac.jp (Postfix) with ESMTP id 40BF0630028; Fri, 16 Dec 2005 00:48:08 +0900 (JST) Received: from beryllium.ruby-lang.org (beryllium.ruby-lang.org [127.0.0.1]) by beryllium.ruby-lang.org (Postfix) with ESMTP id 725E833AC8; Fri, 16 Dec 2005 00:48:03 +0900 (JST) Received: from localhost (beryllium.ruby-lang.org [127.0.0.1]) by beryllium.ruby-lang.org (Postfix) with ESMTP id 0204733AA0 for ; Fri, 16 Dec 2005 00:47:55 +0900 (JST) Received: from beryllium.ruby-lang.org ([127.0.0.1]) by localhost (beryllium.ruby-lang.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 13944-09 for ; Fri, 16 Dec 2005 00:47:54 +0900 (JST) Received: from mail.shic.co.uk (adsl.195-248-105-109.dial.hot.broadband.adsl.broadbandonly.co.uk [195.248.105.109]) by beryllium.ruby-lang.org (Postfix) with ESMTP id 4B26333BF5 for ; Fri, 16 Dec 2005 00:47:49 +0900 (JST) Received: from [127.0.0.1] (localhost [127.0.0.1]) by mail.shic.co.uk (Postfix) with ESMTP id B1B141A7A33 for ; Thu, 15 Dec 2005 15:47:23 +0000 (GMT) Delivered-To: ruby-talk@ruby-lang.org Date: Fri, 16 Dec 2005 00:47:55 +0900 Posted: Thu, 15 Dec 2005 15:46:01 +0000 From: "Steve [RubyTalk]" Reply-To: ruby-talk@ruby-lang.org Subject: Re: Question of reference and (sub)strings. To: ruby-talk@ruby-lang.org (ruby-talk ML) Message-Id: <43A18FB9.3050208@shic.co.uk> In-Reply-To: <40cuekF19kr0tU1@individual.net> References: <40cuekF19kr0tU1@individual.net> X-ML-Name: ruby-talk X-Mail-Count: 50 X-MLServer: fml [fml 4.0.3 release (20011202/4.0.3)]; post only (only members can post) X-ML-Info: If you have a question, send e-mail with the body "help" (without quotes) to the address ruby-talk-ctl@ruby-lang.org; help= User-Agent: Thunderbird 1.4.1 (Windows/20051006) X-Original-To: ruby-talk@ruby-lang.org X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at ruby-lang.org X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on beryllium.ruby-lang.org X-Spam-Level: X-Spam-Status: No, score=-5.6 required=7.0 tests=AWL,BAYES_00, CONTENT_TYPE_PRESENT,FAKEDWORD_ATMARK,RCVDFRMLOCALIP, RCVD_IN_SORBS_DUL autolearn=no version=3.0.3 Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk Lines: 40 List-Id: ruby-talk.ruby-lang.org List-Software: fml [fml 4.0.3 release (20011202/4.0.3)] List-Post: List-Owner: List-Help: List-Unsubscribe: X-Virus-Scanned: by AMaViS snapshot-20020531 Robert Klemme wrote: >> I might be wrong >> > You're not. > >> - but I'm pretty sure that substrings in ruby are >> created with copy-on-write. That is, when you take a substring, a new >> block of memory isn't allocated to the new String, it references the >> same block of memory as the original string - the allocation of a new >> block of memory only occurs when one of the strings is modified. >> > Exactly. It seems this would be the simplest solution. > > That sounds like absolutely great news - a _very_ pleasant surprise. I couldn't have hoped for anything better. (Thanks!) I assume that if I do something like: # Assume offsets is a pre-computed array of positive integer positions into the String originalstr. # with offsets[0]==0 and offsets[-1]==@originalstr.size @fields=Array.new (offsets.size-1) for i in 1..(offsets.size) do # I assume this next line is what is meant by a Ruby sub-string? @fields[i-1]=@originalstr[offsets[i-1]..offsets[i]] end ... and, assuming that @fields is exposed only as a read-only attribute, that I can assume the memory it consumes to be independent of the length of originalstr and dependent only upon numfields? While I've no reason to doubt this confirmed answer, by any chance can someone suggest a good way to demonstrate that this is the case without resorting to either using very large strings and looking at VM usage of the interpreter process... or resorting to reviewing the source to Ruby's implementation?