On Apr 27, 2006, at 12:46 PM, Jake McArthur wrote:

>> My idea is to create an open source code repository, web site, and  
>> set of tools designed to help people to automate the process of  
>> factoring code out of their projects which they can all share.  
>> First, it helps them to find instances of code that need to be  
>> DRYed or DROPed by comparing lines of code across the entire code  
>> base in the repository and pointing out lines that are similar to  
>> things that have already been done before.

Your idea sounds good to me. It'd have the additional benefit of  
helping people notice that they are writing bad code. Many people who  
understand the DRY principle in abstract haven't made the connections  
to realise all the situations it applies to. And repeating other  
people is easy, and it's hard to know that you are.

I'm new here, but I'll try to give feedback.


How will it find similar code? One simple issue is that people will  
name their variables and methods differently, so you'll want to  
somehow see the structure of a section of code and ignore a lot of  
details. But you can't ignore the details too much. Maybe (trivial  
example) someone wrote a "max" function and someone else in-lined it,  
and otherwise their code blocks are the same.

>> If the programmer finds things within his program which he  
>> repeated, then it should be a simple a matter of factoring out to  
>> another function or class within his code to DRY it.

I don't think all code is simple to refactor like that. But maybe  
enough is for this to be useful. Maybe most is? I don't know.

>> If he finds that somebody else has similar code, he can factor it  
>> out into a separate "project" in the repository to DROP it. People  
>> with similar code in the repository are notified so that they can  
>> update their individual projects accordingly if they desire to do so.
>>
>> Using code that has been factored out into these external projects  
>> should be both easy to integrate and easy to keep up to date in  
>> each project. Though I'm not quite sure of the mechanics of how  
>> that would be done yet, I'm envisioning a script programmers can  
>> run that will bring all functions and classes they are using from  
>> external projects up to date in their own program. As it does  
>> this, it runs all the programmer's tests to make sure that it  
>> doesn't break something and pulls back to a previous revision if  
>> necessary. (As such, it would practically be a requirement that  
>> all code that takes advantage of this be unit tested.)

I don't have much experience with unit tests. How well can they  
usually withstand arbitrary changes to code with subtle bugs?

>> This would also provide the benefit that factored out projects can  
>> be edited by anyone, like a wiki,

It's a bit off-topic, but I'm not sure how good an idea wikis are.  
Wikipedia gets a lot of vandalism. But worse: what happens when  
people have a legitimate disagreement about how some code should be  
written? "anyone can post anything" doesn't provide a way to resolve  
disagreement.

There could also be a risk of a malicious code that people auto-update.

>> without screwing everything up; any time something gets messed up  
>> or is incompatible with some projects, somebody will see when they  
>> try to update and can fix it themselves.
>>
>> The web site would show the projects in the repository, provide a  
>> method of discussion around the various bits of code, and give  
>> downloads and instructions for using the resource for yourself.
>>
>> My hope is that this would be a tool that could speed up  
>> development, simplify and stabilize Ruby programs, and bring a  
>> collaborative atmosphere even to individual projects.

I wonder how well the code-similarity algorithm would work for non- 
Ruby code. Just curious how Ruby-specific the tests would be vs how  
general.

>
> I'm making a thread for it because I'm looking for input (ideas,  
> suggestions, etc.).

Hope that helped :)

-- Elliot Temple
http://www.curi.us/blog/