Mark Gallop wrote:
> Hi Charles,
> 
> Charles Pareto wrote:
>> links = page_content.scan(/<a class=1.*?href=\"(.*?)\"/).flatten
>>   
> I don't think that regular expression (regexp) works. Maybe google has 
> changed their code since the book was written. I think it goes "href" 
> then "class".
> 
> If you work out the correct regexp, let us know.
> 
> Cheers,
> Mark
> 
> 

As Mark said, google changed their code somewhat. If you work out the 
correct regular expression and it still seems to give erratic results, 
here is a hint: the naive solution uses ".*?" in a certain place, but 
that will still match too many results. Try [^"]*? instead, because you 
probably don't want to match quotes. (I just tried this, and that was 
the problem I encountered.)

By the way, a robust regex to match all HTML links looks kind of nasty, 
but perhaps you should try writing one--it's a good exercise. (Of 
course, that's not what you want for this--you want to match all links 
of class=l.)

Regards,
Dan