> How will it find similar code? One simple issue is that people will  
> name their variables and methods differently, so you'll want to  
> somehow see the structure of a section of code and ignore a lot of  
> details. But you can't ignore the details too much. Maybe (trivial  
> example) someone wrote a "max" function and someone else in-lined  
> it, and otherwise their code blocks are the same.

I've already been working on this. Right now, I'm making a simple  
algorithm that works on arbitrary text and returns a number  
reflecting how similar two strings are. Even this alone has been  
giving fairly good results on code, even code that was written rather  
differently, but my plan is to use this algorithm to compare symbols  
and literals. A similar algorithm, working on a slightly larger  
scale, would compare entire lines of code for similar syntax,  
augmented by data from the first algorithm.

I'm still thinking about this. Suggestions, anybody?

> I don't think all code is simple to refactor like that. But maybe  
> enough is for this to be useful. Maybe most is? I don't know.

By far it is not, but all I meant is that there is no need to mess  
with the system to do it.

> I don't have much experience with unit tests. How well can they  
> usually withstand arbitrary changes to code with subtle bugs?

Well-tested code will not break unless a test was missed, and if a  
bug is found, writing a test to cover it will practically squish that  
particular bug permanently.

> It's a bit off-topic, but I'm not sure how good an idea wikis are.  
> Wikipedia gets a lot of vandalism. But worse: what happens when  
> people have a legitimate disagreement about how some code should be  
> written? "anyone can post anything" doesn't provide a way to  
> resolve disagreement.
>
> There could also be a risk of a malicious code that people auto- 
> update.

Disagreements could be resolved by simply forking off another  
project. Everybody is happy. And anyway, if everybody agrees on  
tests, and those tests pass, everybody should be happy anyway.

Well-tested projects will not be affected by malicious code because  
the system would see that tests fail and revert back to the last  
working version.

> I wonder how well the code-similarity algorithm would work for non- 
> Ruby code. Just curious how Ruby-specific the tests would be vs how  
> general.

The algorithm I'm currently on is language agnostic, but it doesn't  
benefit from syntax parsing and such like plans reflect.

- Jake McArthur