Tuesday, June 22, 2004

A man, a plan, a pointless (?) program

As the Google engineering department's director of search quality, I (along with my team) am responsible for maintaining the ranking technology that decides what order your results show up in when you do a Google search. It's an important job and an exciting one. I can't tell you all the secrets of what my group does, but I can tell you a non-Google story that will give you a taste of what it's like to work with large amounts of text data and computing resources.



On the last palindromic date, 20:02 02/20 2002, I was, like any good computer geek, reminded of the palindrome that appears on page 170 of the computer manual Common Lisp, the Language (2nd ed):

A man, a plan, a canoe, pasta, heros, rajahs, a coloratura, maps, snipe, percale, macaroni, a gag, a banana bag, a tan, a tag, a banana bag again (or a camel), a crepe, pins, Spam, a rut, a Rolo, cash, a jar, sore hats, a peon, a canal -- Panama!

A quick search reminded me that the record for such a palindrome, established in 1984 by Dan Hoey, was only 543 words. I immediately thought I could (and therefore should) write a program to beat that. I wrote an algorithm that searches a dictionary and figured out how to put the words together in a sentence that starts with "A man, a plan" and ends with "a canal, Panama." It took me until 1:00 a.m. that night of 02/20 (and some minor bug-bashing the next day) to produce this result -- to my knowledge, still the longest palindromic sentence ever created.

So what, you may ask? Good question. I readily admit that my accomplishment has no practical social purpose or business application. But as a story that spans 18 years from Hoey's palindrome to mine, it has a moral about how it is becoming easier to do big things. Hoey is an excellent computer scientist, but he said he spent days writing a disk-based B-tree package for his program. I was saved all this, because a dictionary now fits in main memory and I could use straightforward binary search. Thank you, Moore's Law.

Also, I was saved from having to fiddle with the dictionary because of the public domain Moby Dictionary. Thank you, Internet (and Grady Ward). The advances over the years let me combine a 100,000-word dictionary and a year-old laptop to break an 18-year old record. If you're a programmer, you could do it too: beat my record, or invent something new -- for example, can you invent a double-entendre law firm that is longer than Dewey, Cheatham, and Howe? With the resources available to you, you can accomplish a lot. Let me know what you come up with.

Now if you'll excuse me, I have to get back to work -- I have some ideas that can only be tackled with a few terabytes of text and a few thousand computers.

-- Peter Norvig
director of search quality

No comments:

Post a Comment