On Jan 20, 2:23 pm, "Eric I." <rubytrain... / gmail.com> wrote:
> On Jan 20, 12:04 pm, Dave Thomas <d... / pragprog.com> wrote:
>
> > I wouldn't be surprised if the idea of searching only 1/2
> > of the second string to prevent overlaps is wrong.. :)
>
> I think you're right in that it's wrong.  ;)

...snip

> I'll post my solution in a reply, which is very similar to your
> except in the overlap prevention code, which, I have to admit, is
> pretty ugly.  And I'm not even convinced that I got it right!

Dave's code can be corrected by realizing that since all suffix
strings end at the same place (the end of the string), then of the two
adjacent strings being tested, one is a suffix of the other.

This means that to detect overlap, the following test can be used:

  if prefix.length + s1.length > s2.length then
    # Overlap
  end

where "prefix" is the current prefix being checked in the two adjacent
suffix strings.

Here is a picture. Pretend in the "ababab" case, we are checking the
adjacent strings "abab" and "ababab". Since one is a suffix of the
other, they can be lined up as they appeared in the original string
(in your mind):

   abab
 ababab

Now, the prefix being checked might be "aba". It matches both strings,
but if you check "aba".length + s1.length (7), it's too long to fit in
s2.length (6). In other words, they line up like this:

  ababab  # s2
    abab  # s1
  aba     # prefix, lined up with s2
    ^
    `---- # overlap exists because the prefix
          # as lined up with s2 overlaps with s1
          # when s1 is lined up with s2 as they
          # appear in the original string. In other
          # words, the "aba" in s2 goes past the
          # beginning of the "aba" in s1.

Adding this test (instead of the s2.length / 2 test) and also testing
adjacent strings that start with the prefix currently being searched
(to find later matches if earlier ones overlap) would correct Dave's
solution and shouldn't be much more complicated.

-JJ