This morning, I was reading a very interesting article called Unique in the Crowd: The privacy bounds of human mobility. This is the abstract:
We study fifteen months of human mobility data for one and a half million individuals and find that human mobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier’s antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals.
Before we go deeper into the subject, the situation above reminded me of Monty Phyton’s Life of Brian:
<iframe width=”420″ height=”315″ src=”http://blogs.technet.com//www.youtube.com/embed/jVygqjyS4CA” frameborder=”0″ allowfullscreen></iframe>
But now back to the subject. The example above, to me, just shows one of the key challenges we face, when we look at all the data, which is generated about us. If this data starts to get analyzed for behavior patterns, even the most innocent data all of a sudden might become very sensitive. If you look at the Big Data scenario, in my opinion it gets even worse as then we start to correlate non-identifiable information and very fast we will run into privacy-related issues.
Let’s take the example above: They are able to uniquely identify the individuals based on their pattern how they move. Additionally, you could look at the data to figure out, where they were most – and typically you can fairly easily find out where they work and live. This means, that you can fairly fast (with a little additional effort) not only identify such patterns but even link that pattern to a name and all the doors are now open to “abuse” this data for any kind of purposes.
All these issue do not scare me from a security perspective at the moment but from a privacy approach – and for most consumers, there is no real difference