I got a little worn out keeping up with the explosion of open science email over the holidays, especially the traffic on the open science list run by the OKF. For the most part I deleted without reading under the belief that time with the family is more important than obsessive attention to email during the holiday season.
But I have to respond to one point from Peter Murray-Rust, with whom I often agree - and often disagree.
Peter writes in his blog, on the idea that Open Science means loss of confidentiality: “This is just as puerile. Of course Open Science and Open Data are designed so that patient data, social data, rare species, etc. are kept confidential.”
Actually this idea is not puerile at all.
Openness and confidentiality are uneasy partners at best. A cursory review of the academic literature on re-identification makes this blindingly obvious, but if you’ve never read through it, Paul Ohm’s article “Broken Promises of Privacy” is a good place to start (not to mention open access, refreshingly), as is Latanya Sweeney’s work.
The short version is that we are astonishingly identifiable, and the more data that is available about us, the more identifiable we become. The same powers of integration that make scientific data more useful as they are interconnected apply to the data about ourselves as well. That’s why social media companies can give away their products. Because data about people lets you mark them, fairly uniquely, and sell to them.
Open data is not your new bicycle. We can’t simply throw open at a problem and solve it without creating new problems. And one of those problems is a problem that exists with Big Data generally, whether or not it’s open. Our privacy laws are out of date, re-identification is easy, and harm is subtle to notice. Benefits to sharing personal data accrue mainly to the society at this point, while harm accrues to the individual. We have to take this issue head on, not dismiss it as puerile.
The reality of big data is less anonymity. The question is why Open Data is better for a society with less anonymity than Closed Data.