Posted by Mike Hintze
Associate General Counsel
There’s been much discussion recently about “data anonymization” to protect privacy online. Data anonymization is an especially hot topic when it comes to safeguarding data collected via online advertising or from Internet searches.
Today, I’ve been invited to talk about data anonymization at the “State of the Net” conference in Washington, D.C., put on by the Congressional Internet Caucus.
Microsoft is a leader in the use of anonymization. Our aim is to disconnect a person’s identity from data that is collected about what he or she does online.
Here’s how it works: When you type a few words into Live Search, we use those terms to identify Web pages that match what you’re looking for. We also keep a record of those terms to improve the accuracy of Live Search, to protect the security of the service and to increase the relevance of the ads we display. But we do not keep the search terms in a way that identifies you as the person who entered them.
First, we separate the search terms from any personal information we may have about you (such as your name or e-mail address, which you may have provided when you signed up for Hotmail or another Microsoft service).
Next, once our saved search data is more than 18 months old, we apply a stronger anonymization method that scrubs the information more thoroughly than anything other search engines use. We strip out the entire IP address (which is associated with a specific computer) tied to search queries. We also strip out any other so-called “cross-session identifiers” (such as persistent ‘cookies’ that reside on individual PCs) which can connect search queries to a specific person.
Other search engines use more complicated, but less comprehensive, anonymization techniques. Many delete only part of an IP address, while retaining some or most of the identifying material — potentially for years to come. These weaker anonymization methods raise the risk that search data could be linked to an individual months or years down the road.
While there’s been a lot of attention to the amount of time that search providers retain data before they anonymize it – 9 months? 6 months? 3 months? –we believe the method used to disconnect search data from individuals is far more important than how long the data is held. And, we have said we will adjust our policy and thoroughly anonymize data once it is 6 months old – rather than waiting until it is 18 months old — if our competitors adopt equally stringent anonymization techniques and timetables.
Anonymization doesn’t exist in a vacuum; it must be part of an overall approach to privacy protection. At Microsoft, we limit the amount of data we collect in the first place, and we use anonymization to minimize potential privacy breaches. We strongly secure the search data we retain, and we delete it as soon as we no longer need it to operate and improve our services. We believe consumers should be able to make informed choices about the data gathered about them, so we empower individuals to exercise control over what data is collected about them, and how it is used.
As people conduct more and more of their lives online, Microsoft recognizes the need to continually evaluate the privacy implications of the products and services we offer. Anonymization techniques can make our innovative services more useful while better protecting people’s privacy. It’s one way Microsoft demonstrates its commitment to our customers and to the cause of Trustworthy Computing.
For additional information, see my written testimony, “Privacy Implications of Online Advertising,” from July 9, 2008, before the Senate Commerce, Science & Transportation Committee. The full testimony, including exhibits, is also available.