Sunday, August 13, 2006

Searching the AOL search logs - the implications

Greg Pass and Abdur Chowdhury of AOL and Cayley Torgeson of Raybeam must have been proud. In June they presented a paper, "A Picture of Search" at an international conference in Hong Kong. Then they posted at research.aol.com the data used in their research: detailed log files covering 36,389,567 searches performed by AOL members between March 1 and May 31, 2006.

The researchers goofed. Although the log files didn't have AOL screen names, they did have a unique identifying number for each user. It only takes a few iterations to learn a lot about an
individual -- in some cases, including their identity. AOL yanked the dataset but it was already mirrored worldwide.

An enterpsising Daniel Zhao of Mount Pleasant,
Michigan, who tells me he is an 18 year old about to enter his sophomore year at Penn, quickly registered aolsearchdatabase.com on August 7, and soon thereafter he produced on the Web searchable database of the AOL search logs.

S
o here's what you do: search for something unsavory. For instance:


Click for full-size screen shot

We find, for instance, 43,206 people searched for "child porn." Now, do a new search, filtering only by a user number. You'll see all the searches that person did over a three month period. If you see enough disturbing searches, you'll conclude the searcher is more than just unsavory.

Here's what's going to happen: law enforcement officers at every level are mining this data right now for unsavory searches. When they find a pattern of worrisome searches -- user
2150654 seems very interested in how to make meth -- they'll search for clues to the identities of these searchers. (User 2150654 wants to buy a truck in Oklahoma.) If they can't find a person's identities in the search logs, they'll pursue a subpoena to make AOL cough up the screen name, using the disturbing search terms as probable cause.

In many cases, this will lead to arrests, maybe even successful prosecutions.

And then, watch law enforcement at all levels, from the Justice Department to your local sheriff, demand the ability to fish through search logs indiscriminately.

See http://www.aolsearchdatabase.com