Peter, Apologies for the brevity, on a blackberry. All but two of the unit tests are passing with firewatir. Can you confirm what the proxy and mechanize_doc params are used for in the fetch method? Couldn't find them used anywhere. Mind if I rename methods and variable away from being so mechanize specific? Hope to commit changes to my 3.0 tag tomorrow afternoon On 3/26/07, Peter Szinek <peter / rubyrailways.com> wrote: > Hello all, > > scRUBYt! version 0.2.6 has been released with some great new features, > tons of bugfixes and lot of changes overall which should greatly affect > the reliability of the system. > > ============ > What's this? > ============ > > scRUBYt! is a very easy to learn and use, yet powerful Web scraping > framework based on Hpricot and mechanize. It's purpose is to free you > from the drudgery of web page crawling, looking up HTML tags, > attributes, XPaths, form names and other typical low-level web scraping > woes by figuring these out from your examples copy'n'pasted from the Web > page. > > =========== > What's new? > =========== > > A lot of long-awaited features have been added: most notably, automatic > crawling to the detail pages, which was the most requested feature in > scRUBYt!'s history ever. > > Another great addition is the improved example generation - you don't > have to use the whole text of the element you would like to match > anymore - it is enough to specify a substring, and the first element > that contains the string will be returned. Moreover, it is possible to > create compound examples like this: > > flight :begins_with => 'Arrival', :contains /\d{4}/, :ends_with => '20:00' > > The crawling through next links has been greatly improved - it is > possible to use images as next links, to generate URLs instead of > clicking on the next link, and a great deal of bugs (including the > infamous google next link problem) have been fixed. > > An enormous amount of bugs were fixed and the whole system was tested > thoroughly, so the overall reliability should be improved a lot as > opposed to the previous releases. > > Something non-software related: 4 people have joined the development, so > I guess there is much, much more to come in the future! > > ========= > CHANGELOG > ========= > > * [NEW] Automatically crawling to and extracting from detail pages > * [NEW] Compound example specification: So far the example of a pattern > had to be a string. Now it can be a hash as well, like > {:contains => /\d\d-\d/, :begins_with => 'Telephone'} > * [NEW] More sophisticated example specification: Possible to use regexp > as well, and need not (but still possible of course) to specify the > whole content of the node - nodes that contain the string/match the > regexp will be returned, too > * [NEW] Possibility to force writing text in case of non-leaf nodes > * [NEW] Crawling to the next page now possible via image links as well > * [NEW] Possibility to define examples for any pattern (before it did > not make sense for ancestors) > * [NEW] Implementation of crawling to the next page with different > methods > * [NEW] Heuristics: if something ends with _url, it is a shortcut for: > some_url 'href', :type => :attribute > * [FIX] Crawling to the next page (the broken google example): if the > next link text is not an <a>, traverse down until the <a> is found; > if it is still not found, traverse up until it is found > * [FIX] Crawling to next pages does not break if the next link is greyed > out (or otherwise present but has no href attribute (Credit: Robert > Au) > * [FIX] DRY-ed next link lookup - it should be much more robust now as > it uses the 'standard' example lookup > * [NEW] Correct exporting of detail page extractors > * [NEW] Added more powerful XPath regexp (Credit: Karol Hosiawa) > * [NEW] New examples for the new featutres > * [FIX] Tons of bugfixes, new blackbox and unit tests, refactoring and > stabilization > > ============ > Announcement > ============ > > On popular demand, there is a new forum to discuss everything scRUBYt! > related: > > http://agora.scrubyt.org > > You are welcome to sign up tell your opinion, ask for features, report > bugs or discuss stuff - or to just look around what other's are saying. > > ================ > Closing thoughts > ================ > > Please keep the feedback coming - your contributions are a key factor to > scRUBYt!'s succes. This is not an exaggeration or a feeble attempt at > flattery - since we (obviously) can not test everything on every > possible page, we can make scRUBYt! truly powerful only if you send us > all the quirks and problems you encounter during scraping, as well as > your suggestions and ideas. Thanks everyone! > > Cheers, > Peter > __ > http://www.rubyrailways.com :: Ruby and Web2.0 blog > http://scrubyt.org :: Ruby web scraping framework > http://rubykitchensink.ca/ :: The indexed archive of all things Ruby. > > > -- Glenn