David A. Black wrote:
> I'm not sure I agree conceptually with the idea of
> matching an IO object to a pattern. 

You need this to be able to do fancy parsing/lexing in ruby.

> It actually feels to me like
> there *should* be an explicit, intervening string representation.

And in this string would be what? The entire file contents? That works
well for small files, and I've done it many times myself. It doesn't
scale well, tho. If your explicit string contains only part of the
file, you get other problems.


Eric Mahurin wrote:
> If Regexp#match(obj) used just obj[pos], we could match a
> Regexp across a file with the above.

Yes, and it would be orders of magnitude slower. This kind of thing is
the major performance issue in Cursor at the moment. Efficiency is a
tin god, but some attention needs be paid to it, to make sure that you
don't define interfaces that are inherently inefficent, for example.



Yukihiro Matsumoto wrote:
> Again, it's a matter of cost.  For example, full duck typing
> Regexp#match likely requires re-implementation of a regular expression
> matching engine in Ruby.

Not really, no. You can read into a buffer and match against the
buffer. This works pretty well for every pattern without anchors.
Supporting anchors too requires a smidge of rewriting the Regexp at
runtime. My current implementation interprets ^ and \A (or $ and \Z
when matching backward) as matching at the current file position,
rather than the beginning of file.

It's a minor pain, but I have most of the necessary code already.
There's a compromise or two that have to be made: an upper limit on
the length of a single match, ^ won't work right in some rarer cases
until regexp lookback in ruby 1.9. I don't like this, but these
restrictions seem minor enough, considering the massive increase in
functionality otherwise.


On 8/3/05, David A. Black <dblack / wobblini.net> wrote:
> Hi --
> 
> On Wed, 3 Aug 2005, Eric Mahurin wrote:
> 
> > --- "David A. Black" <dblack / wobblini.net> wrote:
> >
> >> Hi --
> >>
> >> On Wed, 3 Aug 2005, Eric Mahurin wrote:
> >>
> >>> I think a primary example where I would really like real
> >>> duck-typing in a built-in would be Regexp#match(str).  This
> >>> requires the arg to be a String.  I would really like to
> >> have
> >>> this be able to operate on a file.  If I implement a class
> >> that
> >>> walks like, talks like, quacks like a String but really
> >>> accesses a file (practically did that in my rubyforge
> >> cursor
> >>> project), it wouldn't do any good because Regexp#match only
> >>> takes a String - period.
> >>
> >> You can define #to_str on your object to get Regexp#match to
> >> accept
> >> it as an argument:
> >>
> >>    irb(main):006:0> o = Object.new
> >>    => #<Object:0x401f3db4>
> >>    irb(main):007:0> def o.to_str; "hi"; end
> >>    => nil
> >>    irb(main):008:0> /i/.match(o)
> >>    => #<MatchData:0x401eec60>
> >
> > This doesn't really help in the polymorphism department.  This
> > is no different than writing:
> >
> > /i/.match(o.to_str)
> 
> I think it's quite different, certainly in appearance and to some
> extent in logic.  I'm not sure how much more polymorphic one could
> get, unless one had every object present its .to_s representation for
> matching, which would not be good.
> 
> >>> The Regexp#match method could be implemented to take ANY
> >> object
> >>> that implemented a some subset of the String API.
> >>
> >> I think there's a semantic or perhaps definitional issue
> >> here, though.
> >> What does it mean for a regular expression to "match" an
> >> arbitrary
> >> object?  I don't think it's just a matter of what methods the
> >> object
> >> has.  The object has to match the pattern, and the patterns
> >> are
> >> descriptions of strings.  I'm not sure how you would detect a
> >> pattern
> >> like /[A-Z]{3}(\d\d)?/ in something that wasn't a string.
> >
> > By the same methods used in String.  It could get away with
> > just one method to accomplish the task: #[positive_int].  We
> > could put this in IO for example:
> >
> > class IO
> >  def [](i)
> >    self.pos = i
> >    if eof? # pos can go beyond the eof
> >      self.pos = i-1
> >      return(nil) if eof?
> >      self.pos = i
> >    end
> >    getc
> >  end
> > end
> >
> > If Regexp#match(obj) used just obj[pos], we could match a
> > Regexp across a file with the above.
> 
> scanf.rb does something along those lines.  (It gets tricky with
> scanf, because of whitespace and stuff, but it's basically position
> and index manipulation.)  Then again, scanf has always been
> stream-oriented.  I'm not sure I agree conceptually with the idea of
> matching an IO object to a pattern.  It actually feels to me like
> there *should* be an explicit, intervening string representation.
> 
> Nor do I think this is a sign of failure or rejection of the principle
> of duck typing or anything like that.  Everything doesn't have to do
> everything.  For instance, you can't do File.rename on an integer, or
> divide a hash by a float.  And yes, I know that the reductio ad
> absurdum is not proof of anything :-)  I just think there's some
> nuance to some of the cases, including the specificity of the
> pattern/string connection.  I don't see pattern matching as strictly a
> matter of integer indexability.
> 
> 
> David
> 
> -- 
> David A. Black
> dblack / wobblini.net
> 
>