"Michael Schuerig" <schuerig / acm.org> wrote in message
news:1eodn9k.5gnwx1g5scq3N%schuerig / acm.org...
>
> The concrete purpose is to get titles from HTML files, that is the first
> occurrence of any text between <title> and </title>. Better still, I'd
> like to get the "X" from <html>..<head>..<title> X </title>..</head>.

# Sample line from a HTML file
str = "<title>This is the title!</title><title>Another one!</title>"

# Make a regular expression match that finds a text expression that
# 1. Starts with the text "<title>"
# 2. Is followed by any (".") character(s), zero or more ("*"), do it
non-greedy ("?")
# 3. And then followed by the text "</title>" (note that the / is escaped by
a backslash,
# if not the Ruby interpreter would think that the forward slash indicated
the end of the regular expression.)

str.scan( /<title>(.*?)<\/title>/ ).each do |w|
  print w, "\n"
end

If you test the following code:

str = "<title>This is the title!</title><title>Another one!</title>"
print str.scan( /<title>(.*?)<\/title>/ ).class, "\n"

you will be convinced that String.scan returns an array. That is the array
of the matches. I can recommend RubyWin for playing around and learning the
first steps of Ruby. I also helps to study the basics of regular expressions
because they are so powerful when doing text processing.

Please note that the samples provided assumes that the start and end tags
appear in the same string (that is, on the same line in a html file). If
this is not what you have/need, I'm sorry to say that my Ruby knowledge is
too poor for me to give a satisfying answer.

Best regards

/rob