GDPR: Privacy policies of online platforms have significant gaps - Research

BEUC (European Consumer Organisation) research suggests AI can help close that gap

GDPR: Privacy policies of online platforms have significant gaps - Research - CIO & Leader

The current privacy policies of online platforms and services still have a significant margin for improvement when it comes to meeting the standards put forward by the European General Data Protection Regulations (GDR), even after more than a month of the regulations kicking in, according to a research by the European Consumer Organisation BEUC (or Bureau Européen des Unions de Consommateurs) and researchers from the European University Institute in Florence. The researchers also released a study about how Artificial Intelligence can help scan and analyse privacy policies.

According to a statement by BEUC, none of the 14 online platforms analysed by the researchers came close to meeting the requirements.

“Unsatisfactory treatment of the information requirements; large amounts of sentences employing vague langue; and an alarming number of “problematic” clauses cannot be deemed satisfactory,” the organization said.

Google, Facebook (and Instagram), Amazon, Apple, Microsoft, WhatsApp, Twitter, Uber, AirBnB,, Skyscanner, Netflix, Steam and Epic Games were the online platforms whose privacy policies were analysed by the researchers.

Based on this analysis, the university researchers are training an automated evaluator of privacy policies, called CLAUDETTE—short for Automated CLAUse DETeCTER. The goal is that this Artificial Intelligence tool will be able to automatically scan companies’ privacy policies and detect clauses that potentially fail to meet GDPR requirements.

In total, all the policies amounted to 3,659 sentences (80,398 words). Of these, 401 sentences (11.0%) were marked as containing unclear language, and 1,240 (33.9%) contained “potentially problematic” clauses or clauses providing “insufficient” information.

The identified problems include:

  • Not providing all the information which is required under the GDPR’s transparency obligations. For example, companies do not always inform users properly regarding the third parties with whom they share or get data from.
  • Processing of personal data not happening according to GDPR requirements. For instance, a clause stating that the user agrees to the company’s privacy policy by simply using its website.
  • Policies are formulated using vague and unclear language—such as “may”, “might”, “some”, “often”, and “possible”—which makes it very hard for consumers to understand the actual content of the policy and how their data is used in practice.

BEUC will inform the European Data Protection Board about these findings.


The Method

The CLAUDETTE project has been established in order to attempt automating the legal analysis of terms of service and privacy policies of online platforms and services.

The researchers developed a web crawler that monitors the privacy policies of a list of online services. The data retrieved by the crawler is then processed using supervised machine learning technology. They implemented a Support Vector Machine-based classifier trained on the data set annotated by experts following a set of defined guidelines. Such a data set contains over 3500 sentences taken from 14 privacy policies. The accuracy of the classifier was evaluated using a standard leave-one-document-out procedure, showing encouraging precision/recall in several sub- tasks. The analysis indicates that the task of identifying problematic clauses in this kind of documents is in principle automatable. An extended data set is under construction, whose purpose is to improve the accuracy of the classification results. The expert annotations can be visualized using a standard browser at the CLAUDETTE GDPR web site,

The web crawler checks for updates in the list of monitored services every night. If any of these services has been updated (i.e., its text appears to be different from the day before), then the machine learning system is automatically called to process the new document, and results are updated on the server.

BEUC is an umbrella consumers group, based in Brussels, Belgium. It brings together 43 European consumer organisations from 32 countries (EU, EEA and applicant countries).

BEUC represents its members and defends the interests of consumers in the decision process of the Institutions of the European Union, acting as the "consumer voice in Europe". The organisation is funded by an EU grant, its member fees and other specific projects.

The full report, CLAUDETTE meets GDPR: Automating the Evaluation of Privacy Policies using Artificial Intelligence, can be accessed in the BEUC website. The report gives a detailed analysis of platforms, some of the clauses where the gaps are and why they are deemed as gaps, based on specific requirements of GDPR.



Add new comment