--HAv5+T9jbwMPl6Kw
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

* Mark Ericson (mark.ericson / gmail.com) wrote:
> Excellent!  The only thing remaining is an efficient algorithm for a sear=
ch
> for all zipcodes within a given radius.

Using Ruby and SQLite3:

  pabs@halcyon:~/proj/zip> ./import.rb zipcode.{csv,db}
  pabs@halcyon:~/proj/zip> ./find.rb zipcode.db 22003 3
  "city","state","zip","distance (mi)"
  "Annandale","VA","22003","0.0"
  "Springfield","VA","22161","1.62363604423677"
  "Springfield","VA","22151","1.87190097838136"
  "Falls Church","VA","22042","2.97362028549975"

Here's the code for each piece (also available at the URL=20
http://pablotron.org/files/zipfind.tar.gz):

  ---- import.rb ----
  #!/usr/bin/env ruby
 =20
  # load libraries
  require 'rubygems' rescue nil
  require 'sqlite3'
 =20
  # constants
  SCAN_RE =3D /"(\d{5})","([^"]+)","(..)","([\d.-]+)","([\d.-]+)","([\d-]+)=
","(\d)"/
  SQL =3D "INSERT INTO zips(zip, city, state, lat, long, timezone, dst)=20
                VALUES (?, ?, ?, ?, ?, ?, ?)"
  TABLE_SCHEMA =3D "CREATE TABLE zips (
    id        INTEGER     NOT NULL PRIMARY KEY,
 =20
    zip       VARCHAR(5)  NOT NULL,
    city      TEXT        NOT NULL,
    state     VARCHAR(2)  NOT NULL,
    lat       FLOAT       NOT NULL,
    long      FLOAT       NOT NULL,
    timezone  INTEGER     NOT NULL,
    dst       BOOLEAN     NOT NULL
  );"
 =20
 =20
  # handle command-line arguments
  unless ARGV.size =3D=3D 2
    $stderr.puts "Usage: #$0 <csv> <db>"
    exit -1
  end
  csv_path, db_path =3D ARGV
 =20
  # load database, create zip table and prepared statement
  db =3D SQLite3::Database.new(db_path)
  db.query(TABLE_SCHEMA)
  st =3D db.prepare(SQL)
 =20
  # parse CSV and add each line to the database
  db.transaction {
    File.read(csv_path).scan(SCAN_RE).each { |row| st.execute(*row) }
  }
  ----------

  ---- find.rb ----
  #!/usr/bin/env ruby
 =20
  require 'rubygems'
  require 'sqlite3'
 =20
  MI_R =3D 1.15
 =20
  # grab base zip code
  unless ARGV.size > 1
    $stderr.puts "Usage: #$0 <db> <zipcode> [radius]"
    exit -1
  end
  db_path, src_zip, radius =3D ARGV
  radius =3D (radius || 50).to_i
 =20
  # open database
  db =3D SQLite3::Database.new(db_path)
 =20
  # get lat/long for specified zip code
  sql =3D "SELECT lat, long FROM zips WHERE zip =3D ?"
  src_lat, src_long =3D db.get_first_row(sql, src_zip).map { |v| v.to_f }
 =20
  unless src_lat && src_long
    $stderr.puts "Unknown zip code '#{src_zip}'"
    exit -1
  end
 =20
  # calculate min/max lat/long
  ret, range =3D [], radius / 69.0
 =20
  # get all codes within given rectangle
  sql =3D "SELECT lat, long, city, state, zip
           FROM zips=20
          WHERE lat > ? AND lat < ?
            AND long > ? AND long < ?"
  args =3D [src_lat - range, src_lat + range,=20
          src_long - range, src_long + range]
 =20
  db.prepare(sql).execute(*args).each do |row|
    # get row values, convert lat/long to floats
    dst_lat, dst_long, dst_zip, dst_city, dist_st  =3D row
    dst_lat, dst_long =3D dst_lat.to_f, dst_long.to_f
   =20
    # calculate distance between zip codes.  if dst_zip is within the
    # specified radius, then add it to the list of results
    d =3D Math.sqrt((dst_lat - src_lat) ** 2 + (dst_long - src_long) ** 2)
    ret << [dst_zip, dst_city, dist_st, d * 69.0] if d <=3D range
  end
 =20
  # sort results by distance
  ret =3D ret.sort { |a, b| a[-1] <=3D> b[-1] }
 =20
  # print out results as a CSV
  puts '"city","state","zip","distance (mi)"',=20
       ret.map { |row| '"' << row.join('","') << '"' }
  ----

> I suppose one technique might be to first narrow the databse search withi=
n a
> given a given square latitude/longitude range and then filter those resul=
ts
> by testing that they are within the given circle radius

That's all the code above does.  There's some room for optimization
there; for example, you could create a region field, then calculate list
of regions that intersect with the search radius.  If you index on the
region field, then the query becomes essentially an index lookup instead
of a lat/long comparison (you still have to do the second distance
calculation, of course).

Anyway, I didn't do that because the code above runs pretty quickly on
my machine. =20



--=20
Paul Duncan <pabs / pablotron.org>        pabs in #ruby-lang (OPN IRC)
http://www.pablotron.org/               OpenPGP Key ID: 0x82C29562

--HAv5+T9jbwMPl6Kw
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFDoOcjzdlT34LClWIRAh74AJ0Y3trUYm6otAH8s5uAoP6W37Uq7ACbB2cJ
YiyjVF67Q3A5eo/bPnwL5cI=
=q2s1
-----END PGP SIGNATURE-----

--HAv5+T9jbwMPl6Kw--