--0015174c19962ca4900463147b70
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

On Thu, Jan 31, 2008 at 1:20 AM, Sunil Khedar <sunil / truesparrow.com> wrote:

> I am working on a RSS parser script. Here I have to parser thousands and
> thousands of RSS feeds every hour.
>
> I am looking for a optimized parser which can take parse all these
> feeds. Please suggest the RSS parser you have come across.
>

Sounds like a case of premature optimization to me.  If you intend to do
anything like stick the data parsed from the feeds into a database or search
index, I think you'll quickly find that will become the bottleneck, rather
than the feed processing itself.

My company went through something similar, with a performance obsessed
former C++ programmer looking for the fastest feed parsing solution
available.  He settled on building his own, highly procedural feed processor
around libxml-ruby after benchmarking several of the solutions available.
However, soon after he discovered that updating the database and search
index was a far bigger bottleneck, one he spent the next several months
addressing.  Feed parsing speed went completely by the wayside.

If you intend to do any sort of indexing of the feeds at all, you should
really focus on building a maintainable feed reader, as opposed to a fast
one.  The database and/or search index are going to be your bottleneck
anyway, so don't let the desire for speed trump things like correctness and
code clarity.  Feed processing is something that scales horizontally using a
queue and multiple feed reader processes, as opposed to databases and search
indexes which generally don't scale quite as well.

Given that, I would suggest looking at existing solutions like feedtools and
feedzirra before trying to write your own, and if you do, go with Nokogiri.
It has a nice, clear, easy-to-use API and is relatively fast.

-- 
Tony Arcieri

--0015174c19962ca4900463147b70--