On Sat, 5 Jan 2008 08:09:22 +0900, Philip Hallstrom <ruby / philip.pjkh.com> wrote:
>> Can this get any more off topic?
> 
> Yes.  Unless we make it the next ruby quiz to query imdb.com :)

Well, it shouldn't be too hard in principle -- they have complete pages
of just movie titles broken up by year and initial letter.

http://www.imdb.com/TitlesByYear?year=#{year}&start=#{initial}&nav=/Sections/Years/#{year}/include-titles

(Where initial is one of 'A'..'Z' or '*')

The main difficulty is normalizing titles where initial articles have
been moved to the end, and doing so in a language-insensitive way.

However, I think the nice people at imdb.com would frown on someone (let
alone lots of someones) mining hundreds of thousands of movie titles this
way, so I was wondering if there was a reasonably large corpus of titles
precompiled somewhere which we could use instead.

Of course, we could always just write the scripts anyway and pretend
we had a more accessible database of titles to work from.

-mental