--DIOMP1UsTsWJauNi Content-Type: multipart/mixed; boundary="LpQ9ahxlCli8rRTG" Content-Disposition: inline --LpQ9ahxlCli8rRTG Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable * noreply / rubyforge.org (noreply / rubyforge.org) wrote: > Summary: String#scan loops forefever if scanned string is modified insidelock. The subject doesn't really reflect what's actually happening. > Initial Comment: > ruby 1.8.4 (2005-12-24) > > Following code loops infinitely: > > a = " 12345678 "; a.scan(/\d/) {|s| a[3,2]='test'; s} I'm not convinced this is a bug per-se. At least not any more than "loop { }" is. What's actually happening is easier to demonstrate than explain, so here goes (I'm using the caret as the position indicator). " 12345678" ^ #=> no match " 12345678" ^ #=> match, a = " 01test45678 " " 12test45678 " ^ #=> match, a = " 12testst5678 " " 12testst5678 " ^ #=> no match ... (snipped several irrelevant steps) " 12testst5678 " ^ #=> no match " 12testst5678 " ^ #=> match, a = " 12teststst5678 " <-- eek! " 12teststst5678 " ^ #=> no match " 12teststst5678 " ^ #=> match, a = " 12testststst5678 " " 12testststst5678 " ^ #=> no match " 12testststst5678 " ^ #=> match, a = " 12teststststst5678 " (and so on, ad infinitum) What honestly bothers me about this behavior is the converse: making the receiver _smaller_ can cause the scanner to actually _miss_ matches, like so: a, strs = ' abcdef', [] a.scan(/[\w]/) { |s| a[0, 1] = ''; strs << s } strs #=> ['a', 'c', 'e'] Most people would expect ['a', 'b', 'c, 'e', 'f'] there. This could be "fixed" in a a couple of ways: * Raise an exception if the receiver is modified during a scan (I don't really like this option). * Attempt to hack in offset adjustment into string modification. The functions in question are rb_str_splice() and rb_str_aref(), although I haven't investigated fully, so there may be other methods as well. This is really my least-favorite option, because it doesn't handle the case where someone modifies the receiver while keeping the length the same. * Leave things as they are and add a big warning to the String#scan documentation. Personally, I prefer this option. Anyway, attached is a patch that adds a brief note to String#scan. The patch is against 1.8.4, but it applies clean to HEAD as well. -- Paul Duncan <pabs / pablotron.org> OpenPGP Key ID: 0x82C29562 http://www.pablotron.org/ http://www.paulduncan.org/ --LpQ9ahxlCli8rRTG Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="ruby-1.8.4-str_scan_warning.diff" Content-Transfer-Encoding: quoted-printable diff -ur ruby-1.8.4/string.c ruby-1.8.4-string_doc/string.c --- ruby-1.8.4/string.c 2005-10-27 04:19:20.000000000 -0400 +++ ruby-1.8.4-string_doc/string.c 2006-01-26 11:52:03.000000000 -0500 @@ -4240,6 +4240,11 @@ * * <<cruel>> <<world>> * rceu lowlr + * + * <em>Note:</em> You probably don't want to modify the receiver string + * inside the block. Ruby will let you do it, but the result probably + * won't be what you expect or what you want. + * */ static VALUE --LpQ9ahxlCli8rRTG-- --DIOMP1UsTsWJauNi Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFD2P+WzdlT34LClWIRAjIfAKCrEouFqMratTJ7GK8nJ+5hBQAqBgCglNxo P1wVe0cg0TNrtbl2eriK0qI k1 -----END PGP SIGNATURE----- --DIOMP1UsTsWJauNi--