On Wed, Nov 23, 2011 at 8:14 PM, rubix Rubix <aggouni2002 / yahoo.fr> wrote: > I have a set of urls that I want to normalize but I can't find a regex > to do that, this is an url sample: > http://www.example.com/index.php?/topic/something/page__st__20__s__99590dc581fe8e7386051d6dfgdfg4eca4c/ > when I use a web browser I find that this url is equivalent of the > following: > http://www.example.com/index.php?/topic/something/page__st__20 > It is clear that the last part is a checksum but how can I detect that > automatically There is no defined generic semantic for the path and query parameters in an URL. Semantic is only defined for the leading parts (protocol, host, port etc.). How do you expect any mechanism to know that the last part is a checksum (of what btw?)? I mean, completely independent from technical questions of parsing: how would a piece of software detect the checksum from looking at the URL? For specific formatted URLs it's a different story (see Sam's suggestion). Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/