Methods Matter for Protecting Privacy Online

Mike Hintze, Microsoft Associate General Counsel blogs from today's State of the Net Conference in Washington DC, where he spoke on a panel.   Posted on the new Microsoft of the Issues blog:

There’s been much discussion recently about “data anonymization” to protect privacy online. Data anonymization is an especially hot topic when it comes to safeguarding data collected via online advertising or from Internet searches.

Today, I’ve been invited to talk about data anonymization at the “State of the Net” conference in Washington, D.C., put on by the Congressional Internet Caucus.  

Microsoft is a leader in the use of anonymization. Our aim is to disconnect a person’s identity from data that is collected about what he or she does online.  

Here’s how it works: When you type a few words into Live Search , we use those terms to identify Web pages that match what you’re looking for. We also keep a record of those terms to improve the accuracy of Live Search, to protect the security of the service and to increase the relevance of the ads we display. But we do not keep the search terms in a way that identifies you as the person who entered them.   

First, we separate the search terms from any personal information we may have about you (such as your name or e-mail address, which you may have provided when you signed up for Hotmail or another Microsoft service).   

Next, once our saved search data is more than 18 months old, we apply a stronger anonymization method that scrubs the information more thoroughly than anything other search engines use. We strip out the entire IP address (which is associated with a specific computer) tied to search queries. We also strip out any other so-called “cross-session identifiers” (such as persistent ‘cookies’ that reside on individual PCs) which can connect search queries to a specific person.

Other search engines use more complicated, but less comprehensive, anonymization techniques. Many delete only part of an IP address, while retaining some or most of the identifying material -- potentially for years to come. These weaker anonymization methods raise the risk that search data could be linked to an individual months or years down the road.

There's more...