M R Lemon wrote in post #1023844:
> But I can't work out how to loop.
>
>  def get_text(base_url, page_number)
>
>     @target_url = base_url + page_number.to_s
>     @noko_doc = Nokogiri::HTML(open(@target_url))
>
>     @text = ''
>     @noko_doc.css('div.body_recap').each do |text|
>        @text << text.content
>        @text = @text.strip!
>        return @text
>      end
>   end

Aside: there is no need to use instance variables (e.g. @target_url) 
here. Local variables would be fine (e.g. target_url).

Instance variables persist within the object instance even when they 
return (e.g. another method could see what was assigned to @target_url 
previously). This means that you're holding on to references which will 
prevent these temporary objects from being garbage-collected.

> def collect_urls(base_url, page_number)
>   @valid_urls = []
>   text = get_text(base_url, page_number)
>   if text =~ /\A\s*Previous/
>     @valid_urls << "END!"
>   else @valid_urls << @target_url
>     return @valid_urls
>   end
> end
> end

Well, I see no loop in there. Also, the formatting is a bit odd, which 
makes it hard to read. What you've actually written is this:

>   if text =~ /\A\s*Previous/
>     @valid_urls << "END!"
>   else
>     @valid_urls << @target_url
>     return @valid_urls
>   end

So I'm not sure exactly what you're trying to achieve, and which of 
these two conditions is supposed to be the loop termination, but you 
could try this as a starting point:

> def collect_urls(base_url, page_number)
>   valid_urls = []
>   1000.times do   # prevent infinite looping
>     text = get_text(base_url, page_number)
>     if text =~ /\A\s*Previous/
>       valid_urls << "END!"
>       break
>     else
>       valid_urls << @target_url
>       page_number += 1
>     end
>   end
>   return valid_urls
> end

Incidentally, you seem to be relying on the instance variable 
@target_url being set after get_text being called. It would be cleaner 
to return the url and the text as two values.

    return target_url, text

    ...

    target_url, text = get_text(base_url, page_number)

HTH,

Brian.

-- 
Posted via http://www.ruby-forum.com/.