On Tuesday 08 December 2009 02:11:53 pm Steve Wilhelm wrote:
> David Masover wrote:

> > Not if the cities are changing every day. Remember, symbols are never
> > garbage
> > collected -- they should _only_ be used when there is a finite and
> > predictable
> 
> I would have thought number of cities over time would be finite and
> predictable. Granted, the number of cities is probably in the tend or
> hundreds of thousands.
> 
> So symbols would be appropriate if instead of cities, adad was reading a
> text file of state names?

Nope. A typo or an error in the file, and you've got a problem again. It's 
similar to when you've got any sort of external input which you want to 
compare to a finite list of values. It might be tempting to do this:

Values = [:one, :two, :three, :four]
...
if Values.include? input_value.to_sym

What you should be doing is this:

Values = ['one','two','three','four'].map(&:freeze).freeze
...
if Values.include? input_value

If it's still not efficient enough (if there are hundreds of values), put them 
in a Set or a Hash, but that's even more reason not to convert input to syms.

I'd only do it that way you're suggesting if the file in question was part of 
the source distribution, but it sounds like it's coming from an external 
service.

> How about country names (currently under 200)?
> I ask, in an attempt to gauge what is typically considered the accepted
> threshold for using symbols.

In my opinion, the threshold of using symbols is whenever it's a finite number 
that's generated only from trusted sources -- generally, stuff inside your 
application source code.

It also matters how it's being used -- as David Black and Robert Klemme point 
out, symbols are generally for labels. They're what's used to refer to 
functions and variables by "name" in Ruby. Their other major use is for hashes 
of options passed around -- essentially, keyword arguments.



It might also help to think of Symbols as Enum values. Let me put it this way 
-- in other languages, like C and Java, you might have a fixed number of 
values you might want to work with. For example, suppose I want to open a file 
read, write, or both. It's inefficient to actually pass the strings "read", 
"write", or "read/write" with every file open, so instead, I might pass an 
integer, 1, 2, or 3. But that's annoying to work with, so instead, I'd define 
a constant:

#define READ 1
#define WRITE 2
#define READWRITE 3

Now I can do something like:

open("foo", READ)

Then, inside the open function, you'd have something like:

case mode
  when READ
  when WRITE
...

All of which is just shorthand for:

open("foo", 1)

and

case mode
  when 1
  when 2
...

This is vastly oversimplified, and not how it's actually done, but it works. 
This is also such a common pattern that languages have shortcuts for it. I'm 
working from memory here, so the syntax is probably wrong, but the idea holds:

enum { READ, WRITE, READWRITE }

open("foo", READ)

The enum will automatically assign a unique integer value to each of READ, 
WRITE, and READWRITE. As long as that same enum is visible to the code of the 
open function, and to the code calling it, the number assigned to READ, WRITE, 
and READWRITE will be the same each time.

Note that at this point, you really don't have to care what number it is, just 
that it's unique, and that doing it this way is just as efficient as manually 
specifying a number.

And since it doesn't matter, there's no reason passing 1 should be more 
efficient than passing 3085, or anything, as long as it's still a 32-bit 
integer. (Or 64-bit, if you're on a 64-bit platform.) If you were really 
strapped for space, you could use a single byte value, but there is actually a 
real possibility that won't be enough, depending on your application, and you 
want to be backwards compatible. So an int makes sens, and besides, enum is 
doing all the work for you.

So, symbols -- a concept from Lisp, actually -- take this just a step farther. 
Rather than assigning a number that's just unique for that function (READ-1, 
WRITE-2, etc), you get a number that's globally unique. When you type

:foo

...what you're really doing is getting a unique integer, which Ruby will 
replace any occurrance of the symbol :foo with, anywhere in your source code. 
Again, it's oversimplifying -- it's probably implemented as an integer, but 
you'll see it as a Symbol object. The entire point of this is so that you can 
guarantee that the following two things will be true:

:foo == :foo
:foo != :bar

And of course, you can do case structures using symbols, you can use them in 
hash values, and so on.

And because Ruby is reflective, you can do things like:

"foo".to_sym

But scroll back up and look at the "enum" example. This kind of monkying is 
properly metaprogramming -- it would be like writing a program that generates 
enum statements for a header.

Sometimes, it might actually be appropriate. An obvious example is an ORM -- 
an intelligent ORM can read the database schema and create methods named after 
database columns. You'll get those column names as strings, and you'll turn 
them into symbols. You might even be fancy, like Rails, and do some string 
manipulation (pluralize them, etc) and create some more methods.

That's not entirely a new idea, either. If this was a compiled language, you'd 
probably have a tool that took some specification (maybe XML... ugh) and 
converted it into both SQL statements to create that database, and source code 
to access it. The main difference is that Ruby is dynamic enough to do this at 
runtime, just-in-time, rather than having to actually generate source code.

But the concepts are the same.

Here's a quick rule of thumb:

 - Am I metaprogramming?
 - Are these keyword arguments, or some sort of options hash?
 - Are the symbols created with the colon notation (:foo)?

If you answered yes to any of those, symbols are fine. If you answered no to 
all of them, probably not.



phew! I think I need a blog.