Open data and privacy. Should I bother?

Open data and privacy. Should I bother?

Privacy is often mentioned as an obstacle when implementing an open data policy, but never really elaborated on. Should you really bother about privacy when opening up your data? My answer: yes you should.

Alan Westin laid the foundation of our modern conception of information privacy, which focuses on the individual’s right to control what is known about him. The modern European right to information privacy still leans on the notion of privacy as a right to control one’s personal information. Article 8 of the Charter of Fundamental Rights of the European Union gives everyone the right “to the protection of personal data concerning him or her”. This fundamental right to information privacy is further elaborated by the EU Data Protection Directive. The concept of ‘processing personal data’ is the touchstone of this directive. Personal data should be processed fairly and for legitimate and specified purposes.

EU data protection is all about the protection of ‘personal data’. Personal data is “information relating to an identified or identifiable natural person” and an identifiable person is “one who can be identified, directly or indirectly, in particular by reference to an identification number or to one or more factors specific to his physical, physiological, mental, economic, cultural or social identity” (Article 2 of the EU Data Protection Directive). Personal data can thus be both directly and indirectly identifying.

Train times, the location of public toilets and the number of car accidents could all be open data. No open data provider will (hopefully) offer names, addresses, social security numbers, or other data that directly or indirectly identifies natural persons as open data. Open data is at the most anonymized or aggregated data that cannot be related to individuals. The Open Knowledge Foundation visualizes open data and “private data” as two non-overlapping subsets. Unfortunately, in reality this distinction is not so easy to draw.

Even when data has been anonymized or aggregated, data analysis techniques now allow us to re-identify individuals in such data (See Paul Ohm for an overview). For instance, when Netflix offered anonymized data for a contest for the best method to improve its movie recommendations, Arvind Narayanan and Vitaly Shmatikov showed that this data could in fact be used to identify Netflix subscribers.

In particular regarding open data, Andrew Simpson demonstrated that it is relatively easy to link statistical open data to individuals. In one case, names and addresses of councillors, and names, posts and salaries of senior public servants were uncovered by combining data from the British open data portal with other already available public data. The lack of consideration of other data in the public domain prior to publication of statistical open data thus led to the identification of individuals.

Combining datasets is at the core of de-anonymizing and de-aggregating data. Data that is non-identifiable today, may turn out be indirectly identifiable tomorrow. The more computing power and publicly available data, the easier it becomes to identify individuals in data. And when data can be related to individuals, data protection law kicks in.

What does this mean for open data providers? Open data providers should not just consider the identifiability of their open data in isolation. They should also take other publicly available data into account when selecting data that they want to offer as open data. That is a difficult task. Maybe open data is not such a great idea after all?

Open Data Workshop @ Geonovum

Geonovum (a semi-public organization devoting itself to providing better access to geo-information in the public sector) is hosting an open data workshop on November 9, 2011. Location: De Observant in Amersfoort.

Who will be there and what will they be talking about?

  • Marc de Vries (ePSI platform) will try to look into the future of open data.
  • Christopher Dittmann (Shell) will give a talk on the experience of availability/non-availability of open geospatial data.
  • Paul Suijkerbuijk (ICTU) will share his experience with national government open data platform.

Interactive sessions:

  • Johan van Arragon (Province of Zuid-Holland) will talk about the costs and benefits of open data.
  • Paul Hendriks and Peter-Jan Speerstra (Municipality of Rotterdam) will deal with the question of how to implement an open data policy.
  • Jens Riecken (Ministry of the Interior and Local Affairs NordRhein Westfalen, Germany) will explain how to utilize the wisdom of the crowds.
  • Kathleen Janssen will take a step back and will deal with legal, financial and practical issues that need to be tackled. I am particularly interested in this session.
  • Richard Blad will give a talk on how to organize an open data-community.

The full program can be found here: http://www.geonovum.nl/dossiers/kennissessies/opengeodata/programma.

I’ll be there. By the way, in the spirit of the open data philosophy: it’s free!

Reference management going social with Mendeley. Farewell Endnote.

I have been using EndNote for about two years now. Before using EndNote, I had never used a reference manager. I thought it was a great invention. However, after some time a few things started bothering me. Because I work in different places and on different computers, I was always copying my EndNote-library to USB-sticks, e-mailing it to myself and uploading it to my webserver. If I wanted to share an article with a colleague or friend, I had to go into the file structure of my EndNote library, find the file and attach it to an e-mail. When searching for a particular subject in my personal library, I had to either use the Windows Explorer to search through files or use Adobe Reader to search through multiple PDF’s. Besides, EndNote was first released in 1988 and the software GUI seems to have never changed. I know of the existence of ‘EndNote Web’, but I have little faith that it will work smoothly and intuitively after using EndNote’s desktop version. EndNote Web comes ‘free’ with the desktop software. If you’re a student or in research, the desktop version is probably free or costs you pennies. However, that is only possible because your university has bought a campus-licence. A single EndNote licence costs about $ 250.

As you might have expected, I am no longer using EndNote and have found a new reference manager. It is called Mendeley. It is a free web-based reference manager. A desktop version and Microsoft Office plugin can be dowloaded at Mendeley’s website. It also runs on your iPhone or iPad. A working non-official Android app is available at the Android Market (there is a Mendeley API!). All changes to your library are synced to all devices. Sharing an article is as easy clicking on one button. And, it features full-text search through your library.

The features that I am most enthusiastic about are Mendeley’s social features. You can join research communities that share articles on certain topics. Add colleagues and fellow students as contacts and keep up with their research. Upload your own articles and promote them on your profile (watch out SSRN!). Feel free to add me at: www.mendeley.com/profiles/stefan-kulk/.

 

What a contrast: Google uses London open data for tube and bus directions, while Paris public transport operator kills public transport app

This month brought contrasting news on the openness of public transport information in two EU countries. The Telegraph celebrated Google’s mapping service for adding live public transport information and directions. The mobile version of Google Maps has a function that detects a user’s location and that direct him to the nearest tube station or bus stop. Another great function is the alert-function, which warns users to get off their bus or train when they have reached their destination. What’s Google’s secret?

The service relies on Transport for London’s open data platform, which allows developers direct access to data on public transport in the capital, including up-to-date details of roadworks and tube suspensions. Google did not pay for access to the data, which has been freely available since last June.

Around the same time in France, a similar public transport information was killed by the Paris public transport operator (RATP). CheckMyMetro is a free iPhone and Android app that lets French metro users connect to each other and allows them to share information on inter alia incidents and delays. The Paris public transport operator filed a complaint with Apple arguing that the traffic information in the CheckMyMetro app infringed the operator’s database rights. As a result, Apple asked the creator of CheckMyMetro to remove the app from the App Store:

Dear Sirs,

The RATP is a French public company in charge of Public Transports in the Paris area French.
The RATP is the author of the Paris Metro map and the owner of corresponding French design registration (INPI deposit n°06 5325 –Nov. 17th 2006). French and International law on copyright as well as French law on Design thus protect this map. Moreover, the RATP is the owner of the trademark # (INPI deposit n°92402043 – January 21st 1992).

The RATP is concerned with the application “Check my metro” proposed for downloading by the publisher LittleSphere on the App Store and the iTunessince we did not authorize any reproduction or distribution of the said design and trademark.

Moreover, this app embeds the traffic information of our wap site without prior authorization which constitutes an infringement on our rights as producer of database conferred by the French law.

Such reproductions and diffusions may then be considered as counterfeiting acts, and the RATP is entitled to enforce its rights within the French jurisdictions.

Consequently, we ask you to remove the application “Check my metro” by LittleSphere of the App Store and iTunes and to inform the publisher in the same way.

The app is back in the App Store, however, the public transport information has been removed.

Although the French government has started an open data initiative called ETALAB, the Paris public transport information is outside of the realm of the open data initiative because it is in the hands of the public transport operator. The creators of CheckMyMetro, however, are not waiting for the information to be open. They have started their own OpenStreetMap-like project for the Paris Metro at www.checkmymap.fr.

 

Update, 16:00h: I’ve replaced ‘public transport authority’ with ‘public transport operator‘.

Keyword advertising and the L’Oréal v. eBay ruling

A couple days ago, the ECJ gave judgement in L’Oréal v. eBay. L’Oréal had sued eBay because some eBay-sellers offered L’Oréal counterfeit goods for sale. Some sellers also offered goods that were not intended for sale (such as tester or dramming products) or goods bearing L’Oréal trademarks intended for sale in North America and not in the European Economic Area. L’Oréal also submitted that eBay is liable for the use of L’Oréal trademarks in sponsored search results. eBay used L’Oréal trademarks in both the advertisement’s text and as a keywords to trigger  the advertisement. One of the questions before the ECJ was: is eBay’s use of L’Oréal’s trademarks in sponsored search, use in the sense of Article 5(1) of the Trademark Directive? In other words: is eBay’s trademark use actionable?

Trademark use in keyword advertising

The ECJ differentiates between two types of use by eBay. If eBay advertises to promote its own service of making an online marketplace available to sellers and buyers of products, then that use is not in relation to goods or services identical with, or similar to L’Oréal products. eBay ‘s online auction service is in no way related to L’Oréal perfumes, cosmetics or hair-care products. eBay’s trademark use is thus at the very most actionable under Article 5(2) of the Trademark Directive, which protects trademarks with a reputation (and does not require trademark use in relation to identical or similar goods or services).

However, in so far as eBay uses L’Oréal trademarks to promote its customer-sellers’ offers, then there is use  related to goods or services identical with those for which L’Oréal trademarks are registered. The ECJ compares the products offered by eBay-sellers with L’Oréal’s products. According to the ECJ, eBay’s advertisements create an obvious association between the trade-marked goods which are mentioned in the advertisements and the possibility of buying those goods through eBay. Even though eBay itself does not offer the infringing products for sale, the ECJ held that eBay’s use falls within the scope of Article 5(1) of the Trademark Directive.

This is in line with the EJC’s earlier decision in UDV North America  v. Brandtraders, in which the ECJ considered trademark use by an intermediary. UDV was the proprietor of the Community trademark Smirnoff Ice. Brandtraders operated a web site on which traders could anonymously place advertisements and negotiate the sale of goods. Brandtraders, in its own name but on behalf of another company, entered into a contract of sale of bottles of Smirnoff Ice with a buyer for which it used the SMIRNOFF trademark. The ECJ explained that the fact that a third party uses a sign in relation to goods that are not his own goods, does not by itself mean that such use is not covered by Article 5 of the Trademark Directive. Brandtraders was considered to have used the sign in such a way that a link was established between the sign and the goods marketed by Brandtraders.

Use in relation to goods and services is not enough to constitute trademark use in the sense of Article 5(1) of the Trademark Directive. Trademark use should be liable to have an adverse effect on one of the functions of the trademark. Whether eBay’s use affects the functions of L’Oréal’s trademarks is decided based on the result of the black box created by the ECJ in its Google France v. Louis Vuitton and Portakabin v. Primakabin decisions:

94. As regards, finally, whether the use of a keyword corresponding to a trade mark is liable to have an adverse effect on one of the functions of the trade mark, the Court has made clear in other cases that there is such an adverse effect where that advertising does not enable reasonably well-informed and reasonably observant internet users, or enables them only with difficulty, to ascertain whether the goods or services referred to by the advertisement originate from the proprietor of the trade mark or from an undertaking economically linked to it or, on the contrary, originate from a third party (Google France and Google, paragraph 99; and Case C‑558/08 Portakabin and Portakabin [2010] ECR I‑0000, paragraph 54).

Conclusion: Nothing new here!

« Previous Entries Next Entries »