Ray Chen wrote:
> I am also working on a performance app that requires feed parsing.

As previously mentioned, feed-normalizer aims to produce a 'Feed' object 
that is independent of the underlying format. This means it will use 
each parser (in a user-defined order) until it gets back a successful 
parse and usable a object which to interface.

What this also means is that the *primary* goal of feed-normalizer is to 
produce the aforementioned Feed object graph. This might mean it hitting 
3 parsers before it gets that result. So performance isn't really a 
consideration.

Of course, you could change the order of parsing so that feed-normalizer 
uses the fastest parser first, and so on. feed-normalizer currently uses 
most strict to most liberal as its default order. Right now, this just 
happens to be fastest parser first, too :)

> The two that I have tried are feedtools and syndication.  First I tried 
> feedtools for RSS and Atom, but that was too slow, so I switched to 
> syndication for both RSS and Atom.  I found syndication to break on a 
> high percentage of Atom sites, so in the end, I sent RSS to syndication 
> and Atom to feedtools and took the corresponding perf hit for Atom 
> feeds.

In this case you could create a wrapper for feed-normalizer that 
interfaces both syndication and feedtools, and tell feed-normalizer 
which one to use first. I assume you'll probably encounter more RSS than 
Atom.

> 
> I find this approach to be decently robust, but not very elegant.  I am 
> going through > 10k feeds a day of all varieties.
> 
> Can someone comment on the robustness of Ruby RSS Parser and Lucas 
> Carlson's SimpleRSS?  I am curious about Andy's feed normalizer.
> 

I personally have found Ruby's RSS library to be very good at handling 
RSS feeds that aren't broken :) What that means is the results should be 
predictable, but the chance of a good parse may be lower.

SimpleRSS on the other hand is uber-liberal, and if the feed resembles 
anywhere near an RSS or Atom document, you'll probably get a pretty good 
result back, but there are small errors sometimes.

Bob Aman did an overview of both parsers, somewhere on sporkmonger.com.

Back to performance again; I did some rudimentary benchmarks[1] of both 
Ruby's RSS as well as SimpleRSS. I think the results of this benchmark 
really make the point for SimpleRSS being a great 'backup' parser to 
have when nothing else will parse an ill-formed feed.

And of course, I'm always looking for patches and new parser wrappers 
for feed-normalizer.

> HTH,
> Ray
> 
> 

Hope that helps.

Andy

[1] 
http://blog.andyis.textdriven.com/articles/2006/03/28/parsers-in-the-pool