On 7/5/06, Justin Bailey <jgbailey / gmail.com> wrote:
> I like the interface,  and the "humane" access it gives to the structure of
> the page. It appears to handle single items and lists well.

Minor clarification here, because a lot of examples for this stuff is
referencing the "page"; strictly speaking, it's "the 'humane' access
it gives to the structure of the document". Logically, this could be
used to go through any semi-structured document (YAML, OOo files,
etc.) although some formats (e.g., OOo) may require additional work to
have the markup be clean.

> Will I be able to point Ariel at a set of documents, and have it spit out a
> reusable class which I can include in another program? For example, I have a
> Bible reference parser (i.e. things like Gen 1:1, etc.)  that scrapes web
> pages to get the actual verses. Right now I use hand-built regular
> expressions and some patterns to get the right page for a given book,
> chapter and verse. Could I use Ariel to generate the "lookup" code instead?

That is my understanding of Alex's project goal. Remeber that there's
a training phase involved.

> 2. How should a document be labeled?
> > In order to feed the learner, you must save a copy of the type of document
> > you
> > want to extract information from, and then mark up the information you
> > want
> > extracted. What markers would be appropriate?
> > Something such as <l:comment_list>....</l:comment_list> is a possibility.
> Have you heard of microformats? Essentially, its a way to markup existing
> HTML pages with added attributes to indicate structure.Its more less
> intrusive than adding new tags, etc. You can read about them here
>
>   http://microformats.org/about/

Microformats are interesting, but not 100% applicable. One of the
reasons I pushed as hard as I did to make sure that Alex's project was
included in Ruby Central's project list was that I saw it as more than
just web scraping.

-austin
-- 
Austin Ziegler * halostatue / gmail.com * http://www.halostatue.ca/
               * austin / halostatue.ca * http://www.halostatue.ca/feed/
               * austin / zieglers.ca