December 3, 2018

GDPR Technical Series #4: Finding Personal Data – Types and Mechanics

Subra Ramesh

GDPR Technical Series #4: Finding Personal Data – Types and Mechanics

In the last post of our GDPR series, we discussed several closely related terms pertaining to finding data, and we distinguished between them. GDPR defines two types of personal data:

“Basic” personal data, described in Article 6
Special categories of personal data, described in Article 9

Understanding Articles 6 and 9

The “basic” personal data includes items such as (refer to European Commission):

Name and surname
Home address
Email address such as name.surname@company.com
Identification card number
Location data (for example the location data function on a mobile phone)
Internet Protocol (IP) address
Cookie ID;
Advertising identifier of your phone
Data held by a hospital or doctor, which could be a symbol that uniquely identifies a person

These items are typically those that identify a person uniquely. Article 6 of the GDPR governs the processing of this data, typically allowing processing where one or more of the following exist:

Consent of the subject
Legal requirement(s)
Legitimate interest(s) of the controller

The special categories of personal data include items such as:

Racial or ethnic origin
Political opinions
Religious or philosophical beliefs
Trade union membership
Processing of genetic data
Biometric data for the purpose of uniquely identifying a natural person
Health
Sex life or sexual orientation

The special categories of personal data are items that require additional protection and consideration, as these attributes of a person could be used to discriminate against or otherwise target a person. The special categories of personal data are also referred to as sensitive personal data—as they bring an additional level of sensitivity—and therefore require additional consideration.

Under GDPR Article 9, processing of special categories of personal data is prohibited, unless one of these conditions apply:

Explicit consent is given by the subject
The data has already been made explicitly public by the subject
Any one of a number of specific legal or public interest situations (spelled out in Article 9) that require processing of this data exists

In general, compared to Article 6, Article 9 imposes tighter conditions for allowing processing.

Lastly, GDPR Article 10 gives additional protection for data relating to criminal convictions.

Basic vs. Special

From the point of view of automating the detection of personal data, both “basic” personal data and the special categories pose technical challenges. Personal data such as names and addresses can often result in false positives unless the detection algorithm is sophisticated. For example, in an unstructured document, a given name such as “April” without more context might be mistaken for the name of a month. Similarly, since many street or city names are also legitimate last names of individuals, confusion can arise without a sufficiently strong detection method. An example of this is an address such as 12343 Washington Blvd., Fremont CA. Both Fremont and Washington are legitimate last names, street names, or city names.

The ordering of the elements in street addresses also varies widely across countries. Since GDPR covers the European Union (EU), and the EU has 28 member countries (including the UK, which is committed to implementing GDPR even after leaving the EU), the addresses in these countries will need to be detected. Therefore detecting “basic” personal data requires a sophisticated approach.

Customizing Detection

The special categories of personal data present an even greater challenge to automated detection methods. The reason for that is most of the items in the special categories don’t have a tight, formally expressible definition. For example, finding someone’s political leanings as part of an automated scanning program, especially in an unstructured document, is a non-trivial exercise, though a human reading the same document can easily glean this information. Indeed this is where machine learning (ML), training of the ML module, and customization play heavy roles. The automated system of detection must be highly customizable and trainable in order to be effective in finding data that fall under the special category.

Lastly, all of the above algorithms need to operate on data sizes that are exploding. Scalability of any detection algorithm is critical for it to be successful in a real-world GDPR production context.

So far in the context of GDPR, we have covered detection of personal data and protection methods for personal data. In the next post of this series, we will be discussing other aspects of GDPR, namely, breach detection and reporting.

Find out how PKWARE can help you achieve GDPR compliance. Start here with a free demo.