Hallo,

On Tue, 09 Jul 2002 05:20:12 GMT, Yukihiro Matsumoto <matz> wrote:
> Insights?  It's inherited from Perl.  Try:
> 
>   % perl -le 'print join(":", split(".b", "abcabc"))'
>   :c:c

I guess Perl has inherited it from awk.  awk is somewhat simple language
but I think it's at least consistent.  Let me explain why it works this
way in awk.

awk doesn't have types.  Not only types of variables it also lacks types
of values.  So it's impossible to have a variable whose value is a regexp.

That's why in some situations strings are interpreted as regular
expressions.  So you write gsub("c+",...) and it's the same as writing
gsub(/c+/,...).  In these situations regular expression is expected, and
if a string is found, it is converted to regexp.

But in some situations, like the field separator parameter to split(),
it's impossible to convert _all_ strings to regular expressions, since
traditionally, one-character separators were used.  So one-char strings
has to retain their original meaning but, OTOH, there is no way to specify
real regular expression in awk.

That's why, in these situations, new rule has been introduced:
one-char string means one-char field separator, longer strings mean regex
field separators.

> But if it turns out to be a bad inheritance (and I admit I'm starting
> to feeling so), I'm open to a new RCR.

Well, thus I'm speaking about "indirect inheritance from awk" or about
"consistency with awk".

1) gsub()  ...  the parameter has to be regex, so I see no reason for
accepting (and automatically converting) strings.
As one cannot write "abc1bc".gsub(1,"X"), it's necessary to use at least
"abc1bc".gsub(1.to_s,"X"), I'd propose that

	"abcabc".gsub("a","X")

simply won't work, requiring the programmer to use this:

	"abcabc".gsub(Regexp.new("a"),"X")

This also encouradges writing more effective programs, since it encouradges
storing a compiled Regexp.

Another plus of this: compiling regexps from strings is often source of
errors when the regexp contains backslashes.  Thus encouradging usage of
/regexp/ instead of "regexp" is a good thing.

Is this possible or will it break too much old programs?
Matz will decide.  :-)

2) split()   ...   one-character strings and regular expressing are
absolutely necessery.  Automatic conversion of anything to regexp obfuscates
split(), I think.  So I'd suggest either interpreting long strings as strings
or forbidding them completely.


No doubt the currect situation about split() is confusing.
But if you change split() to interpret longer strings literally and leave
gsub/sub as it is, the situation will be confusing again, I'm afraid:
gsub translates strings to regexps, while split doesn't.

Thus I think either split() should be changed in a fairly restrictive manner
(accept only one-char strings or Regexp) or gsub should not automatically
convert strings to regexps.  I vote for the later alternative.

Looking forward to comments,
	Stepan