I've been thinking for a day or so about
signal vs. noise,,, in source code and
perhaps in other contexts (ahem!!).

This is just a crazy theory of mine, so
flame away if you like.

I personally find that the more "littered"
a program is with punctuation and symbols, 
the less I like to look at it. (Yes, it's
possible to have *too little* of that, but
that's rare in programming languages.)

For example, the parentheses in a C "if" 
statement annoy me. The terminating semicolon
in many languages is slightly annoying. The
frequent colons in Python bother me. And let's
not even get into Perl.

As a very crude way of measuring this, I decided
to count alphanumerics vs. symbols in code 
(counting whitespace as neither -- an arbitrary
decision on my part).

I cranked out a quick bit of ugly code (see
below). Obviously it's crude -- e.g., it doesn't 
take note of strings or comments (and it's not
clear what it should do if it did).

I'd be curious to see people's results on a large
corpus of code (Ruby, Perl, etc.).

So far I haven't tried it much, as I just wrote it
half an hour ago.

I have noticed an odd effect already, though. The 
symbol/alpha ratio is fairly low (1-2) for smaller 
programs and larger (4-6) for larger programs. I've
tried it on sources ranging from 10 lines to 2000
lines.

Cheers,
Hal

--
Hal Fulton
hal9000 / hypermetrics.com

  noise = 0
  alpha = 0
  punc = "'"  + ',./`-=[]\;~!@#$%^&*()_+{}|:"<>?'
  punc = punc.split ""
  white = " \t\n".split ""
  $stdin.each_byte do |x|
    case x.chr
      when *punc
        noise += 1
      when *white
        # ignore
      else
        alpha += 1
    end
  end
  
  ratio = alpha/noise.to_f
  puts "Signal/noise: #{alpha}/#{noise} = #{'%4.1f' % ratio}"