--HAv5+T9jbwMPl6Kw
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

* Mark Ericson (mark.ericson / gmail.com) wrote:
> Excellent!  The only thing remaining is an efficient algorithm for a search
> for all zipcodes within a given radius.

Using Ruby and SQLite3:

  pabs@halcyon:~/proj/zip> ./import.rb zipcode.{csv,db}
  pabs@halcyon:~/proj/zip> ./find.rb zipcode.db 22003 3
  "city","state","zip","distance (mi)"
  "Annandale","VA","22003","0.0"
  "Springfield","VA","22161","1.62363604423677"
  "Springfield","VA","22151","1.87190097838136"
  "Falls Church","VA","22042","2.97362028549975"

Here's the code for each piece (also available at the URL 
http://pablotron.org/files/zipfind.tar.gz):

  ---- import.rb ----
  #!/usr/bin/env ruby
  
  # load libraries
  require 'rubygems' rescue nil
  require 'sqlite3'
  
  # constants
  SCAN_RE = /"(\d{5})","([^"]+)","(..)","([\d.-]+)","([\d.-]+)","([\d-]+)","(\d)"/
  SQL = "INSERT INTO zips(zip, city, state, lat, long, timezone, dst) 
                VALUES (?, ?, ?, ?, ?, ?, ?)"
  TABLE_SCHEMA = "CREATE TABLE zips (
    id        INTEGER     NOT NULL PRIMARY KEY,
  
    zip       VARCHAR(5)  NOT NULL,
    city      TEXT        NOT NULL,
    state     VARCHAR(2)  NOT NULL,
    lat       FLOAT       NOT NULL,
    long      FLOAT       NOT NULL,
    timezone  INTEGER     NOT NULL,
    dst       BOOLEAN     NOT NULL
  );"
  
  
  # handle command-line arguments
  unless ARGV.size == 2
    $stderr.puts "Usage: #$0 <csv> <db>"
    exit -1
  end
  csv_path, db_path = ARGV
  
  # load database, create zip table and prepared statement
  db = SQLite3::Database.new(db_path)
  db.query(TABLE_SCHEMA)
  st = db.prepare(SQL)
  
  # parse CSV and add each line to the database
  db.transaction {
    File.read(csv_path).scan(SCAN_RE).each { |row| st.execute(*row) }
  }
  ----------

  ---- find.rb ----
  #!/usr/bin/env ruby
  
  require 'rubygems'
  require 'sqlite3'
  
  MI_R = 1.15
  
  # grab base zip code
  unless ARGV.size > 1
    $stderr.puts "Usage: #$0 <db> <zipcode> [radius]"
    exit -1
  end
  db_path, src_zip, radius = ARGV
  radius = (radius || 50).to_i
  
  # open database
  db = SQLite3::Database.new(db_path)
  
  # get lat/long for specified zip code
  sql = "SELECT lat, long FROM zips WHERE zip = ?"
  src_lat, src_long = db.get_first_row(sql, src_zip).map { |v| v.to_f }
  
  unless src_lat && src_long
    $stderr.puts "Unknown zip code '#{src_zip}'"
    exit -1
  end
  
  # calculate min/max lat/long
  ret, range = [], radius / 69.0
  
  # get all codes within given rectangle
  sql = "SELECT lat, long, city, state, zip
           FROM zips 
          WHERE lat > ? AND lat < ?
            AND long > ? AND long < ?"
  args = [src_lat - range, src_lat + range, 
          src_long - range, src_long + range]
  
  db.prepare(sql).execute(*args).each do |row|
    # get row values, convert lat/long to floats
    dst_lat, dst_long, dst_zip, dst_city, dist_st  = row
    dst_lat, dst_long = dst_lat.to_f, dst_long.to_f
    
    # calculate distance between zip codes.  if dst_zip is within the
    # specified radius, then add it to the list of results
    d = Math.sqrt((dst_lat - src_lat) ** 2 + (dst_long - src_long) ** 2)
    ret << [dst_zip, dst_city, dist_st, d * 69.0] if d <= range
  end
  
  # sort results by distance
  ret = ret.sort { |a, b| a[-1] <=> b[-1] }
  
  # print out results as a CSV
  puts '"city","state","zip","distance (mi)"', 
       ret.map { |row| '"' << row.join('","') << '"' }
  ----

> I suppose one technique might be to first narrow the databse search within a
> given a given square latitude/longitude range and then filter those results
> by testing that they are within the given circle radius

That's all the code above does.  There's some room for optimization
there; for example, you could create a region field, then calculate list
of regions that intersect with the search radius.  If you index on the
region field, then the query becomes essentially an index lookup instead
of a lat/long comparison (you still have to do the second distance
calculation, of course).

Anyway, I didn't do that because the code above runs pretty quickly on
my machine.  



-- 
Paul Duncan <pabs / pablotron.org>        pabs in #ruby-lang (OPN IRC)
http://www.pablotron.org/               OpenPGP Key ID: 0x82C29562

--HAv5+T9jbwMPl6Kw
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFDoOcjzdlT34LClWIRAh74AJ0Y3trUYm6otAH8s5uAoP6W37Uq7ACbB2cJ
YiyjVF67Q3A5eo/bPnwL5cIs1
-----END PGP SIGNATURE-----

--HAv5+T9jbwMPl6Kw--