Every browser cleans up invalid markup. Each one has a different way
to do it. Firefox, for example, adds to every <table> a <tbody>, when
it doesn't exist. Firebug shows you the cleaned up source.
I had to download a website once, because it was so crappy and I
searched for the table entry by hand. It had a path like
"\html\body\table\tr\td\tr\center\font\b\font". Quite annoying, but it
speeded up scraping.

You could try the hpricot gem to get data from websites if the regex
become to complex.