Tomislav Kralj wrote:
> Hello,
> 
> I would be so gratefull to ANYONE who can help me!
> 
> I'm writting a program in C++.
> A part of my program is to compare two strings and as a result I have to
> get a number (range:0-1) which represents a similarity beetwen those two
> strings.
> 
> Which algorythm to use?
> I searched Web, but there's a million of them, and I don't know which to
> use.
> I don't need a solution in C++, just a hint which algorythm to use to
> implement this concept.
> 
> Thanx!

That sounds like a fun problem. As someone already said in a reply, the 
algorithm depends on your requirements, and what type of similarity you 
want. For example, if you wanted to use this information to attempt a 
new type of sorting, the algorithm I'm about to suggest would be useless.

That being said, here's what I would do--it's conceptually very simple:
1. Find the "longest common subsequence". What distinguishes this from 
"longest common substring" (and makes it harder) is that the matching 
letters don't need to be adjacent. For example, the longest common 
subsequence of "aaaacccceeee" and "aaaabbbbccc" is "aaaacccc". This is 
best calculated with dynamic programming, but you can probably find 
guidance on that on the internet.

2. Compare the length of this substring with the lengths of the two 
original strings. Perhaps something simple like "Percent similarity = 
(length of common subsequence) / (average length of two original strings)".

Hope this helps,
Dan