October 16, 2018

GDPR Technical Series #3: Understanding Terminology for Finding Personal Data

Subra Ramesh

GDPR Technical Series #3: Understanding Terminology for Finding Personal Data

In the previous posts of our GDPR series, we covered element-level protection techniques and how they map to anonymization and pseudonymization. In this post, we’ll focus on the difficult task of finding what to protect. In the context of GDPR, this means finding where a given organization stores its personal data.

Before going into techniques for finding personal data, it’s appropriate to first discuss terminology, as there are several terms related to finding personal data used in the industry that are close to each other, each having its own emphasis and connotation.

The terms we will discuss are data discovery, eDiscovery, data classification, data categorization, data mapping, data cataloging, and detection of personal data.

Data Discovery

Generally refers to finding, extracting and aggregating data within an enterprise to support various business initiatives, ranging from analysis to decision making. Data discovery does not refer specifically to personal data, although personal data might be among the types of data discovered.

eDiscovery

eDiscovery is a special instance of data discovery that refers to finding data relevant to a legal case.

Data Classification

Refers to classifying—and often tagging—documents into classification levels specified by an organization. Example classifications could include the level of sharing allowed for a particular document, such as PUBLIC, RESTRICTED, or PRIVATE. There can also be hierarchical classifications which go into greater levels of granularity within top-level classifications, for example, PRIVATE.PAYROLL-DEPT. Data classification doesn’t have to be specific to personal data.

Data Categorization

Categorization is a flexible form of sorting data sets into groups based on inherent similarities between them, as opposed to externally imposed criteria used in classification. Two different documents that are about a particular topic might fall within a category, but depending on classification criteria, might fall into different classes. Despite the differences, categorization can often be used to aid classification.

Data Mapping

Data mapping refers to both identifying and mapping the flow of personal data across an enterprise. GDPR’s Article 30, which requires records of data processing activities to be kept by controllers and processors, is the key GDPR tenet driving data mapping requirements. While the GDPR itself doesn’t refer to data mapping, the ICO, for example, recommends in its section on documenting processing activities, “A good way to start is by doing an information audit or data-mapping exercise to clarify what personal data your organization holds and where.”

Data Cataloging

Data cataloging refers to creating an inventory of all data within the enterprise. It is often part of a larger data governance program and includes items which may or may not be personal data. Due to the breadth of requirements, data cataloging often does not get to the depth needed to understand and protect personal data.

Detection of Personal Data

The term detection of personal data refers to the item-level detection of personal data in an enterprise and is used to differentiate itself from general data discovery and ediscovery. Detection of personal data is a subset of an activity generally described as detection of sensitive data. The difference between sensitive data and personal data is that personal data is sensitive data that can be tied to a real person, as defined by the GDPR. For example, a Vehicle Identification Number (VIN) might be viewed as sensitive data, and not personal data, as long as it cannot be associated to a particular person; once an association can be made, it becomes personal data.

Sometimes, the association may not be made at the time of detection. Additional information, provided later, might create the association. The variability of timing in establishing an association indicates the difficulty with schemes claiming to find only personal data but not general sensitive data. While it sounds attractive to match the approach to GDPR terms, just finding personal data is not sufficient, given the possibility that any sensitive data could become personal data at a later time.

Note that the term “sensitive personal data” is also used sometimes to refer to what is officially known as “Special Categories of Personal Data” in the GDPR specification. For the purposes of the above discussion, “sensitive personal data” is to be treated as a subset of the term “personal data.”

Going forward in this GDPR series, we will use the term “detection of personal data” when referring to the activities needed for GDPR compliance. In our next post, we’ll dive deeper into the categories of personal data and mechanics for finding them within large enterprise stores.

Keep your data GDPR compliant with help from PKWARE. Request a free demo to learn more.