Methods Matter for Protecting Privacy Online


Posted by Mike Hintze 
Associate General Counsel

There’s been much discussion recently about “data anonymization” to protect privacy online.  Data anonymization is an especially hot topic when it comes to safeguarding data collected via online advertising or from Internet searches.

Today, I’ve been invited to talk about data anonymization at the “State of the Net” conference in Washington, D.C., put on by the Congressional Internet Caucus. 

Microsoft is a leader in the use of anonymization.  Our aim is to disconnect a person’s identity from data that is collected about what he or she does online. 

Here’s how it works:  When you type a few words into Live Search, we use those terms to identify Web pages that match what you’re looking for.  We also keep a record of those terms to improve the accuracy of Live Search, to protect the security of the service and to increase the relevance of the ads we display.  But we do not keep the search terms in a way that identifies you as the person who entered them.  

First, we separate the search terms from any personal information we may have about you (such as your name or e-mail address, which you may have provided when you signed up for Hotmail or another Microsoft service).  

Next,  once our saved search data is more than 18 months old, we apply a stronger anonymization method that scrubs the information more thoroughly than anything other search engines use.  We strip out the entire IP address (which is associated with a specific computer) tied to search queries.  We also strip out any other so-called “cross-session identifiers” (such as persistent ‘cookies’ that reside on individual PCs) which can connect search queries to a specific person.

Other search engines use more complicated, but less comprehensive, anonymization techniques.  Many delete only part of an IP address, while retaining some or most of the identifying material — potentially for years to come.  These weaker anonymization methods raise the risk that search data could be linked to an individual months or years down the road.

While there’s been a lot of attention to the amount of time that search providers retain data before they anonymize it – 9 months?  6 months?  3 months? –we believe the method used to disconnect search data from individuals is far more important than how long the data is held. And, we have said we will adjust our policy and thoroughly anonymize data once it is 6 months old – rather than waiting until it is 18 months old — if our competitors adopt equally stringent anonymization techniques and timetables.

Anonymization doesn’t exist in a vacuum; it must be part of an overall approach to privacy protection.  At Microsoft, we limit the amount of data we collect in the first place, and we use anonymization to minimize potential privacy breaches.  We strongly secure the search data we retain, and we delete it as soon as we no longer need it to operate and improve our services.   We believe consumers should be able to make informed choices about the data gathered about them, so we empower individuals to exercise control over what data is collected about them, and how it is used. 

As people conduct more and more of their lives online, Microsoft recognizes the need to continually evaluate the privacy implications of the products and services we offer.  Anonymization techniques can make our innovative services more useful while better protecting people’s privacy. It’s one way Microsoft demonstrates its commitment to our customers and to the cause of Trustworthy Computing.  

For additional information, see my written testimony, “Privacy Implications of Online Advertising,” from July 9, 2008, before the Senate Commerce, Science & Transportation Committee. The full testimony, including exhibits, is also available. 


Comments (3)

  1. Anonymous says:

    Mike Hintze, Microsoft Associate General Counsel blogs from today's State of the Net Conference in Washington

  2. Anonymous says:

    As long as the IP address has not been taken out completely (apart from other data), there is no anonymity at all. So that means that for the first 18 months the user is clearly identified.  In other words your online privacy policy is non-existing.

  3. Anonymous says:

    Dirk raises some important points regarding the retention of an IP address in association with search results.  An IP address is unique to a particular computer at a particular point in time.  And while it does not identify an individual user of that computer in as direct a way as, say, a name or an email address, there is little doubt that collecting an IP address does have privacy implications.      It’s these privacy implications that led to our decision to employ a strong method of anonymization that involves completely deleting the IP address after 18 months.  Other major search engines delete only part of the IP address, and retain most of it indefinitely.    Dirk suggests that during the period that we retain the IP address, our online privacy policy is non-existent.  On the contrary, this is precisely when our online privacy policy is most relevant.  While we temporarily retain this data for important purposes such as protecting security and improving the quality of the search service, we have employed a number of additional steps to help ensure that our users’ privacy is protected.   These protections include keeping search queries separated from personally identifying data such as names or email addresses, strictly limiting access to the data, and strong security measures to help prevent unauthorized disclosure.