But Dave no one will say "You look Marvellous!"

At 09:05 2/6/2001 +0900, you wrote:
>schuerig / acm.org (Michael Schuerig) writes:
>
> > > Please note that the samples provided assumes that the start and end tags
> > > appear in the same string (that is, on the same line in a html file).
> >
> > That's exactly the restriction I'd like to avoid...
> >
> > I haven't looked into it, but I'm sure it's possible to redefine the
> > input record separator, slurp a complete file into a string and match a
> > regex against that string.
>
>str = File.open("x.html") {|f| f.read}
>str =~ /.../m
>
> > This very much goes against my sense of aesthetics. There's no need
> > to read in the file beyond a successful match, and there's no need
> > to read further when an orphaned </title> or a </head> tag are
> > encountered.
>
>All true, but at the same time, if you can do it in two lines rather
>than writing a full parser, isn't there some compensating gain to be
>had?
>
>I've used a technique for a while now to convert structured files from
>one form to another.
>
>1. Slurp the whole file in
>2. Convert escaped characters into something distinct so they are no
>    longer involved in processing.
>3. Match delimiters (for example braces in LaTeX, and <>'s in
>    HTML. This is where you take account of strings, commands and the
>    like.
>4. Perform a series of substitutions which match the command pattern
>    and any arguments. The name of the command is then used either to
>    look up a hash, or as the name of a method to call. The results of
>    all this then get substituted back into the buffer.
>
>It sounds messy, but the reality is that it works, and is a whole lot
>simpler than doing the full parse (particularly for non-regular
>languages such as LaTeX).
>
>
>For your particular example, if I was worried about the potential size
>of reading in the while file, I might just read in the first (say) 2k,
>and quickly check for </head>. If I didn't find it, I'd read another
>2k until I did.
>
>
>    def findTitle(file)
>       str = ''
>       loop do
>         begin
>            str << file.sysread(2048)
>           puts "next"
>         rescue EOFError
>            raise "</title> not found in file"
>         end
>         break if str =~ %{</title>}
>       end
>
>       return $1 if str =~ %r{<head.*?>.*?<title.*?>(.*?)</title>.*?</head>}m
>
>       raise "Couldn't find title in file"
>    end
>
>    title = findTitle(File.open("test.html"))
>    puts title
>
>Can't say as I've tested this, but it _might_ work ;-)
>
>
>Dave