19
Jul 12

Open Data and Privacy: Two Sides of the Same Coin

I’ve written on open data and privacy before here and here. The release of large amounts of public open data require a serious analysis of the privacy-risks. The more data that is out there, the easier it becomes to de-anonymize and de-aggregate the data. Think first, then act. In the Netherlands, a serious analysis of the impact of open data policy on privacy is still lacking.

In the UK, there seems to be a greater awareness of how the release of large amounts of public data could have a negative effect on privacy. Information commissioner Christopher Graham on the issue:

The Information Commissioner’s Office (ICO) has been closely engaged with the Cabinet Office in its work on this, Graham says. And he is glad that the ICO’s role is being recognised and some of the areas where it believes caution is required are being addressed.

They include the anonymisation of data where, Graham says, there is a lot of work still to be done. His office is currently consulting on a draft code of practice on anonymisation and it is tendering for a contract to set up a ‘good practice network’ for anonymisation, intended to develop expertise and spread good practice.

“It’s important to get this right, because there’s a view that anonymisation is a mirage, and that through two bits of information you can always work out who the individual is,” Graham says.

“We think that concern is overdone, in the sense that where things have gone wrong, research shows that it’s because a basic step hasn’t been taken.” (Source: The Guardian)

This greater awareness can be explained partly by the fact that in the UK, the promotion of access to official information and protection of personal information are both tasks of the Information Commissioner’s Office, whereas in the Netherlands, these tasks are separated. Freedom of information and open data are promoted by the the Ministry of the Interior, and data protection is a task of the Dutch Data Protection Agency.

Also in the UK Cabinet Office’s open data white paper, attention is paid to privacy:

We are announcing the appointment of a privacy expert to the Public Sector Transparency Board to make sure we bring in the latest expertise on privacy measures. More broadly, we’re making sure that privacy experts are brought into all sector panel discussions across Whitehall when data releases are being considered. [...]

Therefore privacy is not to be considered as an afterthought. Privacy issues will be considered alongside transparency at the beginning of all discussions concerning the release of a new dataset, which is why we are appointing a privacy expert to the Public Sector Transparency Board. This appointment is one of the key recommendations of the O’Hara report.

Open data and privacy are two sides of the same coin. They need an integrated policy. I hope this gets through to the Dutch open data movement soon.


13
Apr 12

Some more thoughts on open data and privacy

Together with Bastiaan van Loenen, I wrote an article on open data policies and privacy: Brave New Open Data World?. The article is published in the International Journal of Spatial Data Infrastructures Research, volume 7 (2012). Feel free to contact me if you have any comments or questions. The article will be presented during the GSDI World Conference in Québec City, Canada.

Abstract
There is a growing tendency to release all sorts of data on the Internet. The greater availability of interoperable public data catalyses secondary use of such data, which leads to growth of information industries and better government transparency. Open data policies may at the same time be in conflict with the individual’s right to information privacy as protected by the EU Data Protection Directive. This directive sets rules to the processing of personal data. Technological developments and the increasing amount of publicly available data are, however, blurring the lines between non-personal and personal data. Open data does not seem to be personal data on first glance because it is anonymised or aggregated. However, it may become personal data by combining it with other publicly available data. In this article, we argue that these developments extend the reach of EU privacy regulation to open data and may obstruct the implementation of open data policies in the EU.

Update: you can find a Dutch summary of the article at OpenDatarecht.nl


15
Mar 12

Annotation to Scarlet/SABAM case

Together with Frederik Zuiderveen Borgesius, I wrote an annotation to the ECJ’s decision in the Scarlet/SABAM case. Unfortunately (or fortunately), it is in Dutch.

You can download the annotation here.


06
Dec 11

Scarlet v. SABAM, a real victory for Internet freedoms?

A little over a week ago the European Court of Justice gave a long awaited judgement in the Scarlet v. SABAM case. It is the first ECJ case dealing with Article 15 of the E-commerce Directive, which prohibits EU Member States to impose a general obligation on Internet services providers to monitor the information transmitted by them.

SABAM, which is the Belgian copyright collection society, had sought an order requiring Scarlet, an Internet Access Provider, to bring copyright infringements by its subscribers to an end by blocking the transmission of files containing musical works through peer-to-peer software. In order to block infringing transmissions, Scarlet would have to install a filtering system scanning all electronic communications of all its subscribers passing via its services. Furthermore, Scarlet would have to pay for implementing and maintaining the system itself.

The ECJ had to answer the question whether the Copyright Directive and the Enforcement Directive, in the light of the Privacy Directive, the E-Privacy Directive, the E-commerce Directive, and Article 8 and 10 of the ECHR, permit …

“…Member States to authorise a national court, before which substantive proceedings have been brought and on the basis merely of a statutory provision stating that: ‘They [the national courts] may also issue an injunction against intermediaries whose services are used by a third party to infringe a copyright or related right’, to order an [ISP] to install, for all its customers, in abstracto and as a preventive measure, exclusively at the cost of that ISP and for an unlimited period, a system for filtering all electronic communications, both incoming and outgoing, passing via its services, in particular those involving the use of peer-to-peer software, in order to identify on its network the movement of electronic files containing a musical, cinematographic or audio-visual work in respect of which the applicant claims to hold rights, and subsequently to block the transfer of such files, either at the point at which they are requested or at which they are sent?”

To put it simple: Whether Scarlet can be required to implement and pay for a filtering system to filter out copyright infringing files?

Article 9 of the Enforcement Directive instructs Member States to ensure that interlocutory injunctions may be issued against intermediaries whose services are used by a third party to infringe an intellectual property right. Also, the E-commerce Directive does not prohibit courts to lay injunctions on intermediaries as it explicitly leaves open the possibility for a court or administrative authority to require the service provider to terminate or prevent an infringement (recital 45). Article 18 of this same directive instructs Member States to “ensure that court actions available under national law concerning information society services’ activities allow for the rapid adoption of measures, including interim measures, designed to terminate any alleged infringement and to prevent any further impairment of the interests involved.” On the basis of Article 15(1) of the E-commerce Directive, injunctions may, however, never imply a general obligation to monitor the information that is transmitted or stored. The same accounts for a general obligation to actively seek facts or circumstances indicating illegal activity.

Article 15(1) of the E-commerce Directive is thus at the centre of the issue. The ECJ finds that imposing an injunction to install a filter mechanism is in fact an obligation to actively monitor all the data relating to each of its customers (general monitoring), which is prohibited by Article 15(1) of the E-commerce Directive.

The Promusicae case taught us that the fundamental right to intellectual property, as protected by Article 17 of the EU Charter of Fundamental Rights, has to be balanced with other fundamental rights. The ECJ thus also had to touch on the compatibility of an obligation to implement a filtering system, which clearly is a manifestation of the right to property, with the ISP’s freedom to conduct a business (Article 16 of the EU Charter).

Regarding the freedom to conduct a business, the ECJ holds:

“48       […]such an injunction would result in a serious infringement of the freedom of the ISP concerned to conduct its business since it would require that ISP to install a complicated, costly, permanent computer system at its own expense, which would also be contrary to the conditions laid down in Article 3(1) of Directive 2004/48, which requires that measures to ensure the respect of intellectual-property rights should not be unnecessarily complicated or costly.

49        In those circumstances, it must be held that the injunction to install the contested filtering system is to be regarded as not respecting the requirement that a fair balance be struck between, on the one hand, the protection of the intellectual-property right enjoyed by copyright holders, and, on the other hand, that of the freedom to conduct business enjoyed by operators such as ISPs.”

The ISP, however, is not the only actor that is affected by a filtering obligation. The ECJ thus also considered the right to freedom of information and the right to the protection of personal data of Internet subscribers.

The ECJ on freedom of information:

“52      […] that injunction could potentially undermine freedom of information since that system might not distinguish adequately between unlawful content and lawful content, with the result that its introduction could lead to the blocking of lawful communications. Indeed, it is not contested that the reply to the question whether a transmission is lawful also depends on the application of statutory exceptions to copyright which vary from one Member State to another. Moreover, in some Member States certain works fall within the public domain or can be posted online free of charge by the authors concerned.”

Regarding the right to personal data, the ECJ holds:

“51      It is common ground, first, that the injunction requiring installation of the contested filtering system would involve a systematic analysis of all content and the collection and identification of users’ IP addresses from which unlawful content on the network is sent. Those addresses are protected personal data because they allow those users to be precisely identified.”

The ECJ concludes that imposing a general filtering obligation does not respect the requirement that a fair balance be struck between the right to intellectual property, on the one hand, and the freedom to conduct business, the right to protection of personal data and the freedom to receive or impart information, on the other.

However, the ECJ’s assessment of the filtering obligation in relation to the Internet user’s freedom of information and right to personal data is not as strict as its assessment of the filtering obligation in the light of the ISP’s freedom to conduct a business. The ECJ calls the filtering obligation a “serious infringement” of the ISP’s freedom to conduct a business. In contrast, regarding the right to freedom of information, the ECJ only speaks of it being potentially undermined by a filtering obligation.

Regarding the right to personal data, the ECJ seems to only focus on the fact that a filtering obligation implies that IP addresses are collected and identified. The ECJ is right that identifying an IP address is processing of personal information, but it being processing of personal data does not mean that it is absolutely forbidden. The EU Data Protection Directive gives rules on how to process personal data and does not completely forbid such processing. Furthermore, the court mentions that a filtering system involves a systematic analysis of all content sent on the network, but does not qualify it as being problematic in the light of data protection law, nor other aspects of privacy such as communication privacy. After all, privacy is more than data protection and the question of the Belgian court did not refer to Article 10 ECHR without reason.

This judgement is great news for ISP’s that are targeted by copyright owners as the ECJ holds that an obligation to install a general filtering mechanism is in conflict with Article 15(1) of the E-commerce Directive. The court furthermore finds that a fair balance is lacking when an ISP has to install and pay for a general filtering mechanism. This of course does not prevent an ISP from ‘voluntarily’ installing a filtering mechanism as a part of a deal with copyright holders. In this context, it is a pity that the ECJ did not express a clear opinion on the Internet subscriber’s right to freedom of information and the right to privacy when filtering mechanisms are installed.


28
Oct 11

Open data and privacy. Should I bother?

Privacy is often mentioned as an obstacle when implementing an open data policy, but never really elaborated on. Should you really bother about privacy when opening up your data? My answer: yes you should.

Alan Westin laid the foundation of our modern conception of information privacy, which focuses on the individual’s right to control what is known about him. The modern European right to information privacy still leans on the notion of privacy as a right to control one’s personal information. Article 8 of the Charter of Fundamental Rights of the European Union gives everyone the right “to the protection of personal data concerning him or her”. This fundamental right to information privacy is further elaborated by the EU Data Protection Directive. The concept of ‘processing personal data’ is the touchstone of this directive. Personal data should be processed fairly and for legitimate and specified purposes.

EU data protection is all about the protection of ‘personal data’. Personal data is “information relating to an identified or identifiable natural person” and an identifiable person is “one who can be identified, directly or indirectly, in particular by reference to an identification number or to one or more factors specific to his physical, physiological, mental, economic, cultural or social identity” (Article 2 of the EU Data Protection Directive). Personal data can thus be both directly and indirectly identifying.

Train times, the location of public toilets and the number of car accidents could all be open data. No open data provider will (hopefully) offer names, addresses, social security numbers, or other data that directly or indirectly identifies natural persons as open data. Open data is at the most anonymized or aggregated data that cannot be related to individuals. The Open Knowledge Foundation visualizes open data and “private data” as two non-overlapping subsets. Unfortunately, in reality this distinction is not so easy to draw.

Even when data has been anonymized or aggregated, data analysis techniques now allow us to re-identify individuals in such data (See Paul Ohm for an overview). For instance, when Netflix offered anonymized data for a contest for the best method to improve its movie recommendations, Arvind Narayanan and Vitaly Shmatikov showed that this data could in fact be used to identify Netflix subscribers.

In particular regarding open data, Andrew Simpson demonstrated that it is relatively easy to link statistical open data to individuals. In one case, names and addresses of councillors, and names, posts and salaries of senior public servants were uncovered by combining data from the British open data portal with other already available public data. The lack of consideration of other data in the public domain prior to publication of statistical open data thus led to the identification of individuals.

Combining datasets is at the core of de-anonymizing and de-aggregating data. Data that is non-identifiable today, may turn out be indirectly identifiable tomorrow. The more computing power and publicly available data, the easier it becomes to identify individuals in data. And when data can be related to individuals, data protection law kicks in.

What does this mean for open data providers? Open data providers should not just consider the identifiability of their open data in isolation. They should also take other publicly available data into account when selecting data that they want to offer as open data. That is a difficult task. Maybe open data is not such a great idea after all?

Also read:

Or check out Opendatarecht.nl, a Dutch weblog on open data.


25
May 10

Google search over SSL protects search queries for sensitive information – but that’s all

Google has opened up a new Beta search service: search over SSL. Google’s SSL search is the industry’s first search service using SSL. To use the service, visit https://www.google.com/.

SSL encrypts the data that is sent between a searcher and Google, which makes it harder for third parties to see the content of the data packages. This is great news for people that search for sensitive information. However, it should be noted that only the Google-data is secured. An ISP or other agency is still able to see what Web pages you are visiting.


16
Apr 10

Google wants to know what you’re printing

“Google Cloud Print is a web service offered by Google. We expect other entities to provide their own cloud print services as well. Users associate printers with their Google Account via the service. Printers are treated in much the same way as documents are in Google Docs. Therefore, it is very easy to share printers with your coworkers, friends, and family anywhere in the world”. More at Google Code. A world wide network of printers: prInternet. What about privacy?


14
Sep 09

Google wants YOU to define privacy

I made a screenshot (below) while filling out a Google Documents Product Survey. Google asks its Google Docs users to define privacy and security. Is this just a way to show that Google cares about your privacy? I am not sure. This question is asked on the last page of the survey on which Google is also asking its users to define “visual appeal”, “technical reliability”, “thrustworthiness” and “ease of use”. It seems that Google is setting privacy and security as one its principles of design. I wonder what Google will do with the input.

Google Docs Survey


15
Apr 09

How to stop Google Street View from indexing your garden ;-)

robotstxt-bordje