On Wed, Nov 23, 2011 at 8:14 PM, rubix Rubix <aggouni2002 / yahoo.fr> wrote:
> I have a set of urls that I want to normalize but I can't find a regex
> to do that, this is an url sample:
> http://www.example.com/index.php?/topic/something/page__st__20__s__99590dc581fe8e7386051d6dfgdfg4eca4c/
> when I use a web browser I find that this url is equivalent of the
> following:
> http://www.example.com/index.php?/topic/something/page__st__20
> It is clear that the last part is a checksum but how can I detect that
> automatically

There is no defined generic semantic for the path and query parameters
in an URL.  Semantic is only defined for the leading parts (protocol,
host, port etc.).  How do you expect any mechanism to know that the
last part is a checksum (of what btw?)?  I mean, completely
independent from technical questions of parsing: how would a piece of
software detect the checksum from looking at the URL?

For specific formatted URLs it's a different story (see Sam's suggestion).

Kind regards

robert


-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/