Peter Skovgaard wrote:
> In one of my railsapplications I have system for up/downloading files.
> Since I don't want people to browse all the files, but anyone should
> be able to link to his/her files each file has it's own unique URL -
> all pretty standard.
>
> Right now the URL just is /download/sha-sum-of-the-file (since I have
> the SHA anyway), but the length of the url is bugging me.
>
> Now - the problem: The SHA1-sum is, as usually, just represented in
> base16 which makes it 40 chars long. However, in an url we have at
> least a-zA-Z0-9 (and also some special characters) which gives us the
> opportunity to represent the SHA-sum in at least base62 which should
> make it about half the size.
>
> I know that I in this case just could store an extra little random
> string in my database and link to /download/little-string, but that's
> not the point here :) I would just like to hear you: How few
> (printable) characters can you shorten a SHA-sum down to?
>   

Modified Base64 for URLs is probably your best and easiest route.

Wikipedia (http://en.wikipedia.org/wiki/Base64) explains it well:

Base64 encoding can be helpful when fairly lengthy identifying 
information is used in an HTTP environment. Hibernate 
<http://en.wikipedia.org/wiki/Hibernate_%28Java%29>, a database 
persistence framework for Java 
<http://en.wikipedia.org/wiki/Java_%28programming_language%29> objects, 
uses Base64 encoding to encode a relatively large unique id (generally 
128-bit UUIDs <http://en.wikipedia.org/wiki/UUID>) into a string for use 
as an HTTP parameter in HTTP forms or HTTP GET URLs 
<http://en.wikipedia.org/wiki/URL>. Also, many applications need to 
encode binary data in a way that is convenient for inclusion in URLs, 
including in hidden web form fields, and Base64 is a convenient encoding 
to render them in not only a compact way, but in a relatively unreadable 
one when trying to obscure the nature of data from a casual human observer.

Using a URL-encoder on standard Base64, however, is inconvenient as it 
will translate the '+' and '/' characters into special '%XX' hexadecimal 
sequences ('+' = '%2B' and '/' = '%2F'). When this is later used with 
database storage or across heterogeneous systems, they will themselves 
choke on the '%' character generated by URL-encoders (because the '%' 
character is also used in ANSI SQL as a wildcard).

For this reason, a *modified Base64 for URL* variant exists, where /no/ 
padding '=' will be used, and the '+' and '/' characters of standard 
Base64 are respectively replaced by '*' and '-', so that using URL 
encoders/decoders is no longer necessary and has no impact on the length 
of the encoded value, leaving the same encoded form intact for use in 
relational databases, web forms, and object identifiers in general.

Tom

-- 
* Libraries:
    Chronic (chronic.rubyforge.org)
    God (god.rubyforge.org)
* Site:
    rubyisawesome.com