On Saturday 31 May 2003 1:28 am, nobu.nokada / softhome.net wrote:
> Hi,
>
> At Sat, 31 May 2003 08:59:45 +0900,
>
> Wesley J Landaker wrote:
> > Mine also does the + and | operators, as well, though; I'm not sure
> > if that's universally useful.
>
> As for +, is it right to just concatinate them?  Regexp#| is
> provided in lib/eregex.rb.  And you can see also
> <http://member.nifty.ne.jp/nokada/archive/reop.rb>.

Well, + meaning concatination makes sense to me. What else would it 
mean? Notice that I do put regexps in (?:) groups so that you don't 
have any ambiguity if you do something like:

/foo|bar/ + /.*/   # => /(?:foo|bar)(?:.*)/u  

(vs. getting /foo|bar.*/ which would be, I think, not what you expected, 
especially if the regexps were extremely complex)

I wasn't aware that there were so several other regexp-operators 
packages. Must be a good idea if so several different people have also 
thought of it. ;)

One thing that's missing from the packages you point at is that the 
object you get back isn't completely usable as a regexp. They could be 
extended to have the missing methods, of course, but they don't 
currently support them. And if you've added or modified any methods in 
regexp, these objects are of a different type (and aren't class 
descendants) so won't have the changes applied to them (say, if I 
redefine to_s or source or something like that)

i.e.:
irb(main):001:0> require 'eregex'
=> true
irb(main):002:0> x = /foo/ | /bar/
=> #<RegOr:0x401c0ba4 @re2=/bar/, @re1=/foo/>
irb(main):003:0> /test/.methods - x.methods
=> ["casefold?", "|", "source", "&", "~", "match", "kcode"]

Anyway, looks like eregex & is pretty handy; and your reop.rb looks even 
better, but for me, I think mine is a lot more useful in that it is 
totally transparent: when you do an operation on regexps, you get a 
regexp back. It doesn't create an object hierarchy as the other two you 
cited do; I toyed with that idea, but I didn't like it because I got 
objects back that behaved differently than regexps and couldn't be 
easily redefined without having some intimate knowledge of the operator 
package.

BTW, I never wrote '&' because I didn't really need it, but it could be 
done with something like this:

In RegexpOps.rb:
# the other code I posted goes here
class Regexp
  def &(other)
    /(?=#{self})#{other}/u
  end
end

Then:
irb(main):001:0> require 'RegexpOps'
=> true
irb(main):002:0> /foo/ & /bar/
=> /(?=(?:foo))(?:bar)/u

Of course, that regexp will never match anything, but you get the idea. 
;)

> > Looks like 1.8 still doesn't catch encoding flag in this case;
> > there doesn't appear to be any '(?' prefix that changes encodings,
> > though, which would be a prerequisite. (Personally, I'm happy with
> > UTF-8. ;)
>
> Current regexp engine (and perhaps Oniguruma too) can not mix
> encodings.  Well, would it be better to preserve it and raise
> an exception when it doesn't match?

For me, the encodings are not a problem, as I only use UTF-8; I do a lot 
of multilingual stuff, and UTF-8 is the only way I can support English, 
French, Spanish, German, and Japanese (strange mix, but those are the 
languages I work with!) simultaneously in Ruby.

In general, though, it seems like it would be a good idea to catch 
attempts at mixing encodings and throw an exception if they are 
incompatible. I might add that to mine.

-- 
Wesley J. Landaker - wjl / icecavern.net
OpenPGP FP: C99E DF40 54F6 B625 FD48  B509 A3DE 8D79 541F F830