Tomislav Kralj wrote: > Hello, > > I would be so gratefull to ANYONE who can help me! > > I'm writting a program in C++. > A part of my program is to compare two strings and as a result I have to > get a number (range:0-1) which represents a similarity beetwen those two > strings. > > Which algorythm to use? > I searched Web, but there's a million of them, and I don't know which to > use. > I don't need a solution in C++, just a hint which algorythm to use to > implement this concept. > > Thanx! That sounds like a fun problem. As someone already said in a reply, the algorithm depends on your requirements, and what type of similarity you want. For example, if you wanted to use this information to attempt a new type of sorting, the algorithm I'm about to suggest would be useless. That being said, here's what I would do--it's conceptually very simple: 1. Find the "longest common subsequence". What distinguishes this from "longest common substring" (and makes it harder) is that the matching letters don't need to be adjacent. For example, the longest common subsequence of "aaaacccceeee" and "aaaabbbbccc" is "aaaacccc". This is best calculated with dynamic programming, but you can probably find guidance on that on the internet. 2. Compare the length of this substring with the lengths of the two original strings. Perhaps something simple like "Percent similarity = (length of common subsequence) / (average length of two original strings)". Hope this helps, Dan