This is my understanding of how redirects work:

The way it's supposed to work is that when you request a web page(by 
sending the server a url), and the web page is not at that url, the 
server should send back a response which contains a Location header. 
The Location header's value is a new url where the web page is located. 
A browser would automatically send out a request to the server for the 
new url.  On the other hand, a ruby program has to do that manually: the 
program must extract the Location header, get the url, and then send 
back another request to the server for that url.  That's the way it's 
supposed to work.

Unfortunately, people who own websites don't have to follow the rules; 
they can do whatever they want.  In particular, they don't have to 
include a Location header in the response. The same goes for browser 
manufacturers; they aren't required to sell browsers that conform to the 
standards.  Some browser's are programmed to recognize a Refresh header 
as part of a response--even though a Refresh header doesn't technically 
exist(according to the standards there is no such header).  Therefore, 
it's possible that a server will send back a response that contains a 
Refresh header instead of a Location header.  The Refresh header's value 
is the new url for the page you are seeking--just like with the Location 
header.

But it gets even worse.  Some server side web programmers aren't able to 
or don't know how to add headers to the response.  As a result, the 
response won't contain a Refresh header(nor the standards compliant 
Location header) that specifies the new url.  To remedy that, the server 
side web programmer has another option: he can arrange to send back a 
skeleton html page that contains a special html tag.  The special html 
tag looks like this:

<meta http-equiv="refresh" content="2;url=http://webdesign.about.com">

That tells the browser to pretend like there is a Refresh header in the 
response.  The new url is specified after the 'content=' part.  The 
number in front of the url is the number of seconds the browser is 
supposed to wait before sending out a request for that url.

To sum up, when the server wants you to go elsewhere to find the web 
page you requested, the server can send back a response with:

1) A Location header specifying the new url.
2) A Refresh header specifying the new url.
3) An html tag specifying the new url

Of course it would be a lot easier on client side programmers(you), if 
the only option was 1).  Instead, you have to check for all three in 
order to find the new url.


> redirectUrl=data.scan(/<META HTTP-EQUIV=\"REFRESH\" CONTENT=\"0; 
> URL=(.*)\">/).to_s
> if redirectUrl!=nil then
>   puts 'Direct to: ' +redirectUrl
> end

That regex won't work for any of the following:

1) <meta http-equiv="refresh" 
content="2;url=http://webdesign.about.com">

2) <meta       http-equiv="refresh" 
content="2;url=http://webdesign.about.com">

3) <meta http-equiv = "refresh" content = 
"2;url=http://webdesign.about.com">'

so make sure you come up with a regex that can handle all of them.


You also might want to check out Mechanize, etc. which automatically 
handles some redirects.



-- 
Posted via http://www.ruby-forum.com/.