Peter,

Apologies for the brevity, on a blackberry.

All but two of the unit tests are passing with firewatir. Can you
confirm what the proxy and mechanize_doc params are used for in the
fetch method? Couldn't find them used anywhere. Mind if I rename
methods and variable away from being so mechanize specific?

Hope to commit changes to my 3.0 tag tomorrow afternoon

On 3/26/07, Peter Szinek <peter / rubyrailways.com> wrote:
> Hello all,
>
> scRUBYt! version 0.2.6 has been released with some great new features,
> tons of bugfixes and lot of changes overall which should greatly affect
> the reliability of the system.
>
> ============
> What's this?
> ============
>
> scRUBYt! is a very easy to learn and use, yet powerful Web scraping
> framework based on Hpricot and mechanize. It's purpose is to free you
> from the drudgery of web page crawling, looking up HTML tags,
> attributes, XPaths, form names and other typical low-level web scraping
> woes by figuring these out from your examples copy'n'pasted from the Web
> page.
>
> ===========
> What's new?
> ===========
>
> A lot of long-awaited features have been added: most notably, automatic
> crawling to the detail pages, which was the most requested feature in
> scRUBYt!'s history ever.
>
> Another great addition is the improved example generation - you don't
> have to use the whole text of the element you would like to match
> anymore - it is enough to specify a substring, and the first element
> that contains the string will be returned. Moreover, it is possible to
> create compound examples like this:
>
> flight :begins_with => 'Arrival', :contains /\d{4}/, :ends_with => '20:00'
>
> The crawling through next links has been greatly improved - it is
> possible to use images as next links, to generate URLs instead of
> clicking on the next link, and a great deal of bugs (including the
> infamous  google next link problem) have been fixed.
>
> An enormous amount of bugs  were fixed and the whole system was tested
> thoroughly, so the overall reliability should be improved a lot as
> opposed to the previous releases.
>
> Something non-software related: 4 people have joined the development, so
> I guess there is much, much more to come in the future!
>
> =========
> CHANGELOG
> =========
>
> * [NEW] Automatically crawling to and extracting from detail pages
> * [NEW] Compound example specification: So far the example of a pattern
>     had to be a string. Now it can be a hash as well, like
>    {:contains => /\d\d-\d/, :begins_with => 'Telephone'}
> * [NEW] More sophisticated example specification: Possible to use regexp
>     as well, and need not (but still possible of course) to specify the
>     whole content of the node - nodes that contain the string/match the
>     regexp will be returned, too
> * [NEW] Possibility to force writing text in case of non-leaf nodes
> * [NEW] Crawling to the next page now possible via image links as well
> * [NEW] Possibility to define examples for any pattern (before it did
>     not make sense for ancestors)
> * [NEW] Implementation of crawling to the next page with different
>     methods
> * [NEW] Heuristics: if something ends with _url, it is a shortcut for:
>     some_url 'href', :type => :attribute
> * [FIX] Crawling to the next page (the broken google example): if the
>     next link text is not an <a>, traverse down until the <a> is found;
>     if it is still not found, traverse up until it is found
> * [FIX] Crawling to next pages does not break if the next link is greyed
>     out (or otherwise present but has no href attribute (Credit: Robert
>     Au)
> * [FIX] DRY-ed next link lookup - it should be much more robust now as
>     it uses the 'standard' example lookup
> * [NEW] Correct exporting of detail page extractors
> * [NEW] Added more powerful XPath regexp (Credit: Karol Hosiawa)
> * [NEW] New examples for the new featutres
> * [FIX] Tons of bugfixes, new blackbox and unit tests, refactoring and
>     stabilization
>
> ============
> Announcement
> ============
>
> On popular demand, there is a new forum to discuss everything scRUBYt!
> related:
>
> http://agora.scrubyt.org
>
> You are welcome to sign up tell your opinion, ask for features, report
> bugs or discuss stuff - or to just look around what other's are saying.
>
> ================
> Closing thoughts
> ================
>
> Please keep the feedback coming - your contributions are a key factor to
> scRUBYt!'s succes. This is not an exaggeration or a feeble attempt at
> flattery - since we (obviously) can not test everything on every
> possible page, we can make scRUBYt! truly powerful only if you send us
> all the quirks and problems you encounter during scraping, as well as
> your suggestions and ideas. Thanks everyone!
>
> Cheers,
> Peter
> __
> http://www.rubyrailways.com :: Ruby and Web2.0 blog
> http://scrubyt.org :: Ruby web scraping framework
> http://rubykitchensink.ca/ :: The indexed archive of all things Ruby.
>
>
>


-- 
Glenn