A few months back I needed a lexer written in Ruby.  All I found on the 
RAA was ruby-lex which apparently an extension dependent on flex.

So here's what I came up with, I'm posting it here as a snippet (since we 
don't have Rubycookbook.org anymore :-( )... there's probably also a 
snippet section on the wiki and I'll put it there as well.  Basically 
you just define a hash of tokens where a regex is the key and a token 
type is the value (see the bottom of the listing for a usage example).  
Feel free to offer ideas for improvement, that's why I'm posting it.

###################Lexer.rb##################
class Lexer
  def initialize(string,tokenHash)
    @string   = string 
    @tokenHsh = tokenHash
  end

  def each_token
    tokenType = nil
    tokenString = ""
    re  = nil
    prc = nil
    while @string.length > 0
      #skip white space
      if(@string[0..0] =~ /\s/)
        @string = @string[1..-1]
        next
      end
      beforeLen = @string.length
      @tokenHsh.each { |re,prc|
        if index = @string.index(re) 
          next if index > 0
	  tokenType = prc
	  puts "tokenType is: #{tokenType} " if $DEBUG
	  tokenString = $1
	  puts "tokenString is: #{tokenString}" if $DEBUG
	  @string = @string[index + $1.length .. -1]
	  yield tokenType,tokenString
        else
          if @string.length == 0
	    puts "Finished!"
	    return 
          elsif index && index > 0
	    puts "Error"
	  end
        end
      }
      if beforeLen == @string.length
        puts ">>>>>>>> ERROR <<<<<<<<<"
	puts "unknown token-> #@string"
	exit
      end
    end #while
  end #each_token
end

class Token
  def to_s
    self.type
  end
end

#example:
if $0 == __FILE__
#define some Token classes:
class OpenParen < Token
end

class CloseParen < Token
end

class Word < Token
end

class Str < Token
end

class Number < Token
end

class Comma < Token
end

  #define token hash:
  tokens = {
    /(\()/                         =>  OpenParen,
    /(\))/                         =>  CloseParen,
    /([-[:digit:]]+)/              =>  Number,
    /([A-Za-z][0-9A-Za-z_]+)/      =>  Word,
    /(\"[-.0-9A-Za-z_\s+:]+\")/    =>  Str,
    /(\,)/                         =>  Comma
  }

  string = '(coma, seperated, list (999)("string")(((,)))'

  lexer = Lexer.new(string,tokens)
  puts "Tokenize: #{string}"
  lexer.each_token {|token,str|
    puts "#{token} : #{str}"
  }
  
end
##################################################################