Ray Chen wrote: > I am also working on a performance app that requires feed parsing. As previously mentioned, feed-normalizer aims to produce a 'Feed' object that is independent of the underlying format. This means it will use each parser (in a user-defined order) until it gets back a successful parse and usable a object which to interface. What this also means is that the *primary* goal of feed-normalizer is to produce the aforementioned Feed object graph. This might mean it hitting 3 parsers before it gets that result. So performance isn't really a consideration. Of course, you could change the order of parsing so that feed-normalizer uses the fastest parser first, and so on. feed-normalizer currently uses most strict to most liberal as its default order. Right now, this just happens to be fastest parser first, too :) > The two that I have tried are feedtools and syndication. First I tried > feedtools for RSS and Atom, but that was too slow, so I switched to > syndication for both RSS and Atom. I found syndication to break on a > high percentage of Atom sites, so in the end, I sent RSS to syndication > and Atom to feedtools and took the corresponding perf hit for Atom > feeds. In this case you could create a wrapper for feed-normalizer that interfaces both syndication and feedtools, and tell feed-normalizer which one to use first. I assume you'll probably encounter more RSS than Atom. > > I find this approach to be decently robust, but not very elegant. I am > going through > 10k feeds a day of all varieties. > > Can someone comment on the robustness of Ruby RSS Parser and Lucas > Carlson's SimpleRSS? I am curious about Andy's feed normalizer. > I personally have found Ruby's RSS library to be very good at handling RSS feeds that aren't broken :) What that means is the results should be predictable, but the chance of a good parse may be lower. SimpleRSS on the other hand is uber-liberal, and if the feed resembles anywhere near an RSS or Atom document, you'll probably get a pretty good result back, but there are small errors sometimes. Bob Aman did an overview of both parsers, somewhere on sporkmonger.com. Back to performance again; I did some rudimentary benchmarks[1] of both Ruby's RSS as well as SimpleRSS. I think the results of this benchmark really make the point for SimpleRSS being a great 'backup' parser to have when nothing else will parse an ill-formed feed. And of course, I'm always looking for patches and new parser wrappers for feed-normalizer. > HTH, > Ray > > Hope that helps. Andy [1] http://blog.andyis.textdriven.com/articles/2006/03/28/parsers-in-the-pool