The difficulty that you=92ll run into is in your need for the new, =
shorter value to be unique.  Hashes are not, and cannot be
designed to be unique.  It=92s all in the numbers.  If you have a 100 =
character string of 8 bit characters (assuming ASCII, not Unicode),
the you have 800 bits of information.  You could tale advantage of the =
fact that not all 256 values of a byte are valid for your string
to reduce it=92s size some.  If you limit to 7-bit ascii, then there=92s =
1 bit per byte that could be =93reclaimed=94.  All of these factors are
taken into account in compression algorithms.  So compression is the =
direction you need to look.  Be careful, because many
compression algorithms give longer results than their input if the input =
is particularly short (I seem to recall that some have fall-back=20
approaches to account for this.

Hashes (either the built in hash() method you=92ve already discovered =
the issues with, or cryptographic hashes like MD5 or SHA1 are
designed to statistically minimize the number of has collisions.  You =
can take a reasonably long input and the odds of any two, different,
strings being the same are VERY low, but it=92s not guaranteed.  Two =
inputs producing the same hash is referred to as a hash collision.  =
Cryptographic
hashes are designed to minimize collisions, but, since they are of a =
fixed size, there are only so many possible result values and that won=92t=
 be
enough to guarantee unique results for strings.  If your value space =
(i.e. the number of strings you=92re trying to ensure uniqueness for is
not in the millions are billions, and you can live with the results of =
your compare basically being a statement that, if you get the same =
value,
there=92s a 1 in XXXXXXXXX change of them being actually being different =
strings, then  cryptographic hash might be sufficient for you.  Just be =
aware
that two strings with the same has might be VERY VERY likely to be the =
same string, but that it=92s, at least remotely possible, that they are =
tow different=20
strings producing the same hash.

Look into compression methods first.  Compression is what you=92ve =
described.  If you=92re strings are sufficiently long, then, =
off-the-shelf compression
could easily be your answer.  If they=92re short and you have special =
knowledge of the allowed input values (ex: you=92re using ASCII, and =
only allow
a-z, A-Z, space, comma, period, =85) you may find that there are only, =
say 100 valid values per character (or anything less than 128), then you =
could
compress them to 7/8ths of their original size (using very simplistic =
compression).  Take a look at simple zip compression and others like it. =
Their purpose
is to do what you=92re asking=85 provide a shorter value from which must =
be unique for every unique input value (since it must be able to =
decompress).

Using the theoretical 100 values per character scenario I just gave.  =
The number of possible values of a string are 100^n (where n is the =
number of characters),
So, fo 20 characters=85
possible values =3D 100^20 =3D> 1e40
number of bits =3D log2(possible values) =3D> 132.8771
bytes =3D number of bits / 8 =3D> 16.6096
So, in theory, you can get 20 character strings down to 17 bytes=20

If you go up to 200 characters=85167 bytes

Encryption, as you=92ve seen, has no goal of producing shorter output =
than the input, so it=92s not going to provide your solution.

(OK, I=92ve started rambling.. probably more detail than you needed=85 =
look for compression routines)

On Jan 1, 2014, at 10:56 PM, Rodrigo Lueneberg <lists / ruby-forum.com> =
wrote:

> I am trying to generate a unique number for a string and would like =
some
> suggestions.
>=20
> So far, I've researched the hash, but it seems not consistent on the
> values generated. It is not reliable since it does not generate the =
same
> value all the time.
>=20
> My next idea was to use any easy encrypt method, but that would =
generate
> a large string and would be resource expensive.
>=20
> My next idea is convert the string to hex. Asp.net uses this approach =
a
> lot. Is this is a good idea? What are the drawbacks?
>=20
> The reason I need this unique "Hashcode" is because I want to save =
this
> value on the database and compare it with new inserts in order to =
avoid
> duplicate values. There are hundreds of Json objects that corresponds =
to
> fields in the db.
>=20
> So I thought of instead of comparing values with each
> field maybe the easier method was to find a way to get the unique Json
> string value/code. With the hashcode I can check in the last 15 =
minutes
> if the user is trying to submit the same data to db and prevent it =
from
> being submitted again.
>=20
> I also think that this method uses very little memory compared to =
using
> session, but I don't want to discard the session idea yet. If there is =
a
> good approach, I may use it.
>=20
>=20
> Well, I hope I made it clear enough to get some feedbacks.
>=20
> Thanks
>=20
> Rod
>=20
> --=20
> Posted via http://www.ruby-forum.com/.